May 28, 2006

Don't Fuck With Geoffrey K. Pullum

I'm not sure what went on behind the scenes at One Language Log Plaza to provoke this devastating take-down, but Geoff Pullum completely fucks shit up:

Certainly, it is possible that the phrase dada kraut psych mindblowing conscience expanding sublime acid oriented arcana coelestia weirdness has roughly nine stacked attributive modifiers; but one cannot really tell, because it all depends on how it is parsed: doubtless "consciousness-expanding" (I add the helpful hyphen) is intended as a syntactic unit, but one doesn't know about "kraut psych" and so on. This is basically the problem one finds with quotes from chimpanzee language: chimps are occasionally reported as having signed things with transcriptions like BANANA BANANA HELP REFRIGERATOR GIMME OPEN BANANA GIMME, and syntactically one does not really know where or whether to begin.


Part of the problem here is that Eric is one of the younger staffers here at Language Log Plaza. They work with headsets on, they have X-men posters on their walls, they talk about whether Lara Croft's breasts in the new Crystal Dynamics video game release are as big as before. The average age in their part of the building is approximately 19. They typically list their hobbies as (i)~being wicked cool, (ii)~dancing to their iPods in public places, (iii)~shopping at American Eagle, and (iv)~staying out all night. One does not see them at EVOO; they dine at place where the menu is a series of brightly colored pictures on glass with lights behind them, and often there is a neon sign in the window saying "BURRITOS AS BIG AS YOUR HEAD". And their reading material does not fully meet the criteria for being called "language".

Which raises the question: how much would you pay to see Belle Waring and Geoffrey K. Pullum in a heavyweight title bout?

Posted by todd at 6:42 PM | Comments (0)

February 18, 2006

Sketchballs

Language Log, a group log housed on the UPenn computer science servers, and headed by a linguist and CS professor at UPenn, is great for things like a brief history and usage of insults ending with -ball. After the history, there's

The Xy → Xball is not foolproof, though: silly doesn't yield *sillball, presumably because sill is not a morpheme here. And in general polysyllabic insults don't take -ball. [...] [I]t seems totally implausible to refer to someone as an idiotball -- or a bastardball or an a**holeball either. In contrast, polysyllabic nouns for nasty substances seem plausible as a base. Thus mucousball ought to work, it seems to me, even though it's not to be found in Google's index. Corpus linguistics still has some limitations, I guess.

I don't really have a lot to say about sketchballs, but I do think it's interesting that Liberman verifies most of his arguments for the validity or invalidity of a word by "argument from Google results count." It seems that the fact that this is a reasonable thing to do is pretty obvious to everyone who's thought even a little about computational linguistics. For me personally, it's one of those projects I always wanted to sit down and give a long, hard thought to, but I never did. More specifically, I wanted to use Google as part of a language generation tool, as a way to quantify the probabilty of a person using an automatically generated phrase.

Maybe one day.

On a related note, via AI-Complete, here's a neat looking paper called automatic meaning discovery using Google.

Posted by todd at 1:25 PM | Comments (0)