Yesterday I stumbled across this interesting article (published in The Linguistic Review but available on the web at the preceding link) by Rens Bod (winner of the 2004 Best Name in Linguistics Award) called Exemplar-Based Syntax: How to Get Productivity from Examples.
I'm not sure it completely delivers on the title. For those not in the know, there's tension in the Cognitive Sciences (which includes Linguistics) between example-based explanations for cognitive behavior and symbolic explanations. The example-based approaches, called "Exemplar Theories," tend to assume that people store vast amounts of information about their past experiences in long-term memory, and that these form the basis of our cognitive behavior. So, when we say that I have a "concept" of the color green, what I have under an exemplar theory is hundreds of thousands of stored memories of having seen green. If I want to know if something is green, I compare it (subconsciously, of course) to these examples, and if it is sufficiently similar (or, under some versions, more similar to this than to any other coherent collection of examples), I decide that it is, in fact, green. The rival school of thought operates by storing salient characteristics and encoding decision rules for classification. (There are, of course, a great many other approaches, but these are the two opponents in the ring at the moment.) Exemplar-based theories have the advantage that they can account for lots of observed behavior that rule-based theories don't seem to be able to (more accurately - can, but only with lots of acrobatics) - especially frequency effects and subtle changes in production. So, to give a linguistic example, one of my friends has said that I actually sound slightly southern when I come back from spending a week with at home in North Carolina. In general, I only have slight traces of a southern accent, but some people say it's more noticeable when I get back from having spent time around southerners. An exemplar theory can easily explain this through the fact that I would have, over the course of this week, stored plenty of examples of southern speech, and since my knowledge of language is ultimately formed from and instantiated in nothing but examples, it stands to reason that exposure to southern examples would drag my speech in that direction. Symbolic theories can only account for such things by spelling out ridiculous amounts of absurdly fine-grained rules, so are not as elegant on these points.
The damning case against exemplar theories and in favor of symbolic theories has always been what's called "productivity." To continue with the previous example, language is made up of small units that are combined to form complete utterances. This is what allows for the fact that you can both produce and understand sentences that you've never heard before. If I head the sentence "Mbala loves Owatu," even though I have never heard these names before, I understand the sentence because I understand the various parts and how they go together to form a complete utterance.
I myself don't really see the point of the debate in Linguistics. It's clear that some parts of human language show the effects that exemplar theories are good at dealing with and that these are mostly in the realm of "performance" - that is, producing vocal representations of sentences (talking) and translating the sound waves that come into your brain via the eardrum into useful information. The "competence" part of language seems to me pretty unambiguously symbolic. I'm content to leave it at that. This is far from an uncontroversial compromise. Many would claim that the competence/performance divide is itself artificial (and these tend to be the kinds of people who champion exemplar theories). As a group, they seem offended by the very idea that symbolic processing ever happens anywhere, and I'm not really sure why - but I guess they have their reasons.
Anyway - the Holy Grail for such theories in Linguistics is an account of syntactic compositionality and productivity that doesn't need a symbol system. So when Bod subtitles his paper How to Get Productivity from Examples, it tends to catch your attention.
The reason I say that it doesn't completely deliver on the title, though, is because it specifies innate composition and decomposition rules. In other words, it ascribes to the speaker the unlearned ability to decompose utterances into constituent parts and to reassemble them into new utterances. The "compositionality" part, therefore, doesn't fall out of the exemplar theory - it's taken as a prior, which amounts to giving away the farm, at least on the main point. Productivity comes from these innate rules, and not from the store of examples.
But this isn't to say that Bod's theory is useless. In fact, it has some very interesting things to say.
The theory on the whole works like this. It has a uniform theory of (syntactic) representation for utterances. It has a set of decomposition operations (dissecting utterances into words), a set of composition operations (composing words into utterances), and a probability model that computes the likelihood of hearing/producing a given utterance based on the probabilities of its component parts.
In actual practice, what we do is take a corpus of utterances and assign all possible binary trees to them (that is, group words, two at a time, and group groups of words two at a time and so on until every word in the sentence is organized into some representation of a possible structure). The probability of a given utterance (string of words) is the sum of the probabilities of all possible derivations based on this corpus of trees. Derivations are accomplished by joining subtrees together at appropriate places. The probability of a certain interpretation (i.e. a certain parse tree) for a given utterance is the probability of that tree out of the sum of the probabilities for all attested trees for that string in the corpus.
The cleverness of this approach comes in the fact that we can get probabilities for trees for utterances that we've never seen before - by building them out of previously-seen utterances. Thus, Bod's model overcomes one of Chomsky's biggest objections to such approaches - that they would be helpless in the face of novel utterances (and novel utterances are a non-trivial section of the utterances we produce and hear) because the probabilities of such utterances would always be 0.
But there are some obvious objections here too. Mainly - that the model as stated so far would overgenerate - producing any number of ungrammatical utterances. This is so because there are only rudimentary category restrictions on what can combine with what (he does, actually, bring syntactic labels into this a bit, but I have glossed over that part in the interest of brevity).
To fix this problem, Bod proposes to use Lexical-Functional Grammar to constrain the set of possible trees. This, at least, allows him to bring in categorial features that wouldn't be captured by trees - like gender, number, etc.
I think this is a valuable contribution and an interesting approach, and I will definitely do some further reading on it, especially as it applies to machine translation.
I just wanted to say that I do not believe that this has solved the problem of getting productivity and compositionality out of an exemplar model. Clearly, the components of the model that give us compositionality and productivity are innately specified - given.
So what is this good for? Well, quite a lot, actually. Most importantly, it gives us an objective way to capture linguistic realities like the fact that you can't say things like (to use Bod's own example) "How many years do you have?" for "How old are you?" in English. Although "How many years do you have?" would be a fine word-for-word direct translation from many languages, and although it is a legal English sentence and would probably be understood by most native speakers in actual conversation, it just isn't the way we talk. Linguistics has had a very difficult time accounting for usage preferences on this level. We're good at getting it on the individual word and individual sound level, but not so much on accounting for preferences at the sentence level. What is especially ellusive is accounting for preferences for alternate possible wording for novel sentences. Clearly, part of a speaker's knowledge of his language includes this kind of thing, and yet Linguistics is unable to account for it.
Bod's model - Data-Oriented Parsing in general (I gather he is far from the only one working on this) - seems on a first glance to have a lot of potential for solving some of these problems. So I find it very interesting and will want to learn more about it.
Some concerns - admittedly off the cuff and not really thought out.
- Adopting LFG ruins the case against UG - now, I don't personally mind. I have no problem with UG. But I can't follow Bod when he says that he's proposing we adopt "universality of representation" as opposed to "universal grammar." I fail to see the distinction. If you adopt a theory of grammar as articulated as Bresnan and Kaplan's, then you are committing yourself to a fair amount of innate knowledge of grammar. Admittedly, it's nowhere near as much as the behemoth that Chomsky and Minimalism/GB have spawned, but there's still a lot of assumptions built in. Let me just reiterate for the record: I don't mind this, and in fact I agree with the standard approach that part of the job of syntacticians is trying to figure out what mechanisms we need to assume and which we can derive. So all I'm saying here is that I don't buy the claim that this frees us from UG (though I understand that he has a book that may clear up some of these points).
- Is this an exemplar theory? - I guess it depends on what you mean by exemplar theory. It is in the sense that it's storing a lot of examples, as good exemplar theories should. But it isn't in the sense that it lets innate rules do the heavy lifting.
- Conflates grammaticality judgments with acceptability judgments - this is a big one for me. I am perfectly capable of recognizing "How many years do you have?" as gramamtical, even though I don't exactly consider it discourse-acceptable. I think these are two separate concepts, and Bod's theory conflates them.
But none of these are slam dunk cases against. I'm enthusiastic about this on the whole.