The Only Winning Move: The Dynamics of Pointless Arguments

This week in Philosophical Foundations we covered the Dynamical Systems approach to cognition. It's not an area I know much about, though I would like to know more. I guess that Dynamical Systems is a more serious competitor for the traditional symbolic approaches than Connectionism.

One of the assigned papers - an article in Trends in Cognitive Sciences by Randall Beer - dealt with just this question: namely how Dynamical Approaches compare to the Symbolic and Connectionist approaches. As it was only a light 9-page article attempting to introduce a complicated subject to the uninitiated, there would be no point in coming down too hard on it or in expecting it to anticipate and answer all objections. However, being a Linguist, I did get a little bit annoyed at some of the focus on dynamical approaches to language modeling.

The example of language is brought up on the first page, in fact:

Language, that quintessentially human skill, is often seen as the strongest argument against non-symbolic approaches to cognition. However, despite its symbolic character, language understanding and production are essentially temporal events, with preceding words strongly influencing the interpretation or selection of later ones.

This is my favorite non-argument for dynamical approaches: that because x proceeds in time, we must necessarily make time one of the variables in our model of x. In itself, there's nothing necessarily wrong with this. Time may or may not turn out to be an important variable for modeling any kind of system, but it doesn't hurt to try to work it in and see where it leads you -- with that much I agree. Where it gets irritating is how flexible the Dynamical Systems people are in what they accept as "time." Here, notice, linear ordering is an adequate definition. The idea that one state precedes and exerts some influence on the next makes this "dynamic." Well, fine, then language is "dynamic." The point is surely that no one has ever disputed this. In fact, I will go on record right here and now saying that there isn't a single serious syntactic theory among the current proposals in which word order plays no role. (Although, to be fair, LFG started out trying to remove linearity because there is approximately one language in the world that seems to have a complete lack of linear ordering rules - though even this case is disputed. So making a possible allowance for LFG, most serious syntactic theories place heavy influence on incremental parsing and stepwise operations on states in sentence construction. They were, you might say, "dynamical" long before Port and VanGelder put out their book.) And these theories are, of course, all held up as cannonical examples of hopelessly symbolic approaches, as is clear from the quotation above. In fact, it isn't just syntactic theories that can be so described. As far as I know, all models of computation involve stepwise processes of transformations between states. They all proceed "in time" according to this definition. Now, Beer admits this in the box on p.92:

Sets of differential or difference equations, cellular automata, finite state machines and Turing machines are all examples of dynamical systems.

And if Turing machines are dynamical systems, then under current assumptions all computational models are dynamical systems, since it is hypothesized that "Every 'function which would naturally be regarded as computable' can be computed by a Turing machine." If we accept as our definition of time a mere ordering of states, each of which exerts an influence on the next, in other words, then it is clear that all symbolic systems proposed for cognition are instances of dynamical systems and that there is no conflict between these approaches.

I am satisfied with that characterization. But Beer evidently is not, as he devotes a section of his paper to explaining how dynamical systems differ from symbolic and connectionist approaches. This section, in my opinion, is a classic example of selective (as opposed to "accurate") employment of descriptive phrases to achieve a desired effect on the reader. For example:

A typical symbolic model is expresed as a program that takes as input a symbolic description of a problem to be solved. Then, using the system's general knowledge about the domain in which it operates (also symbolically represented), this description is manipulated in a purely syntactic fashion in order to obtain a solution to the problem. (p. 96)

Now, this is indeed a workable characterization of "a typical symbolic model," but the emphasis on "problem solving" is manipulative. Beer can get away with this because the most prominent examples of symbol systems are logic machines, theorem provers and such. Since humans tend to associate "logic" with "homework" that they were assigned in high school, we do indeed speak of "logic problems" and so on. But I have never been under the impression that logicians think of it that way - at least not anymore. These days, logicians think more in terms of the aforementioned "purely syntactic transformations," moving from equivalent form to equivalent form. The "problem to be solved" isn't really one of extracting meaning from the symbols - that's for humans to do - but rather of transforming a complicated set of such symbols into another set that is easier for humans to understand. Logical insights consist almost entirely of (and are currently characterized in terms of) noticing that some forms "reduce" to others. And of course once we get away from automatic deduction, the analogy with "solving problems" becomes even looser. The GUI display on my computer is very much implemented as a symbolic system in program form, but it is not clear that it is solving any particular problem. More pertinent: when we say that people parse sentences in linear order we can, as we choose, speak of them solving "an interpretation problem" or of simply "processing input."

Beer's favorite term for characterizing dynamical systems is "perturbations." Rather than "input" and "processing," we get things like this:

On this view, inputs do not uniquely specify an internal state that describes some internal state of affairs. Rather, they serve as a source of perturbations to the system's intrinsic dynamics. (p.96)

This sounds impressive, but it's not clear to me that it's saying anything different from the characterization of symbol systems above. Directly misleading is the bit about "uniquely specifying an internal state." This is not any less true of dynamical systems than it is of symbol systems (though it happens to be true of the logic machines that Beer was apparently trying to invoke with his choice of adjectives). Virtually no symbol system of requisite complexity allows its input to "uniquely specify" the internal state. Rather, the input coupled with the rules specified by the current internal state produces a transition to the appropriate next state - or an output, perhaps, as required. Again with language - if we look at the HPSG theory of syntax, we see a symbolic system that incrementally builds a syntactic object. Items are encoded as sets of (syntactic and semantic) feature specifications which are either specified or empty. Words combine with each other by resolving their feature matrices under unification. Incompatible feature sets result in crash (ungrammaticality). Features that are specified on one item but not on the other are then specified for both. The combination of two items results in a new item which is the unified feature matrix of both items. This is then incrementally combined with the next item in the sentence and so on. At each stage, the specification of the constructed object's features forms what we could call a state. Inputs (the next word or formed syntactic object) do not get to control this state. Rather, they are either accepted or rejected by the state depending on what came before, and if accepted they form a new state in accordance with unification. Grammaticality is encoded in terms of ability to reach any of a number of acceptable end states. If one of these states is reached, we determine that we have a grammatical sentence. It's true that it doesn't typically occur to anyone to describe such a process as involving "perturbations," but there is nothing in the content here that in any way fails to meet Beer's early (in the paper) characterization of a "dynamical system:" it proceeds through input-driven changes in state over a series of steps (i.e. "in time").

If the only substance of the dispute between dynamical approaches and symbolic approaches is going to be how we characterize the state transition function and things like whether we are more likely to produce the word "perturbation" or "program," then I really don't see the point in splitting hairs over it.

Beer does, though, for reasons similar to those I gave in response to Andy Clark's "extended mind" hypothesis:

However, great care must be taken in generalizing these notions. If any internal state is a representation and any systematic process is a computation, then a computational theory of mind loses its force. (p.97)

That's true enough. Although, we could just as easily have said this about his characterization of "time" earlier: if any ordered sequence represents "time" and any representation is a "state," then the dynamical systems theory of mind loses its force.

I don't mean to be flippant. There are important differences in representation here, but I don't think Beer does them justice in focusing on descriptive choices like "perturbations" vs. "programs" to carry the weight of his argument. In other places, he does considerably better:

...a [dynamical] system's internal state does not necessarily have a straightforward interpretation as a representation of an external state of affairs. Rather, at each instant in time, the internal state specifies the effects that a given perturbation can have on the unfolding trajectory. (p.97)

This strikes me as much closer to a sensible level on which to hold the debate. What we're arguing about isn't really whether or not cognitive process unfold in time (of course they do!), but what level of semantic transparency we expect from our internal representations, and indeed whether the processes we're modelling should be characterized in terms of themselves (i.e. in terms of the objects on which they operate) or in terms of some more general model (vector calculus). Strictly speaking, all symbolic systems (at least, those relevant to cognition) are dynamical systems - a point Beer made effectively early in the paper and then seemed to want to abandon.

I do not believe the debate as I have just characterized it can or should be resolved. Semantic transparency is a desireable thing; uniformity between branches of science is a desireable thing. The more semantically transparent a system is, the more symbolic (the representation of) that system is likely to be. This necessarily comes at a cost to uniformity: not all processes in the world can be explained by the same set of symbols and operations. To the extent that we wish to reduce cognition to physics, then obviously the vector calculus-based models that approaches we currently call "dynamical" employ are appropriate. If we want to describe an operation in terms of the objects on which it operates, then clearly symbolic approaches are going to be more helpful.

It is frequently claimed that implementational questions favor "dynamical" approaches, but I do not think that this is the case at all. It depends heavily on which problem we're modeling, in fact. In Linguistics, for example, it is indisputable that symbolic approaches are inappropriate for speech recognition. They simply do not work. The function that maps external sound waves to internal phonemes has to contend with a huge amount of noise and individual variation. Attempting to characterize this process on a purely symbolic level would amount to little more than individually assigning every floating point number in the universe a linguistic-sounding name. Exactly the opposite is true of syntactic parsers. It's not inconceivable that several hundred years from now we will have the tools to map soundwaves onto syntactic objects and so have some kind of "dynamical" approach to syntax. But neither is it clear that anyone would want or use such a thing. Investigating grammaticality on the level of vibrations in air molecules builds a whole lot more into the system than we need or can conceptualize. It's like letting your calculator do your long division for you: you get the right answer, but the process becomes mysterious. You cannot "know" long division by farming it out to your calculator, just as you cannot "know" syntax by building a giant computer that can correctly predict grammaticality by superhuman amounts of number crunching.

Now, people like Beer might (I haven't met him, I don't know) want to follow this up by saying that syntax ultimately is a matter for perturbations of air molecules and associated neuronal firings - but this seems excessively pointless. So what? I suppose that trees are also collections of quarks, but it does me little good to think of them that way if what I want is shade or firewood. If I want to chop down a tree, it is helpful to think in terms of axes and saws, sharpness of blade, etc. Having to go all the way down to the level of quarks would be an immense (and life-threatening, if I'm just trying to get warm) waste of time. Likewise, programming a parser works much better on the symbolic level than it does on the "dynamical" level. Just as I wouldn't expect development of speech recognition software to take a symbolic approach, I would be very surprised to see a company wasting money on neural network approaches to parsing.

The remarkable thing about Elman's network - to pick on Beer's example - isn't that it can process sentences at all but that it does such a bad job at it. It proves exactly the opposite of that which it is generally taken to prove. People get excited about Elman's experiment because it's taken as an existence proof that connectionist networks can induce syntactic structures. The fact that this is in some sense surprising is meant to stand in as an argument that parsing was never a symbolic process to begin with. But that's exactly the opposite of how science generally works. Generally, as scientists we have something we want to model, we form competing models and do tests to see which one handles the data better. The model that in fact handles the data better (the symbolic model, in this case) "wins." That the "losing" model may be able to improve itself over time to again become a competitor is irrelevant. We do not throw out working models in favor of their poor cousins because of someone's personal philosophical preference or insistence that he is prognosticator (though that individual is, of course, free to do as he likes).

Beer's analogy with Newtonian and Einsteinian physics is equally useless:

It might be argued that to distinguish dynamical approaches from [symbolic and connectionist] approaches is mistaken for any of the following reasons: dynamical models can be simulated on digital computers, dynamical models can be given a computational description, Turing machines can be described as discrete-time dynamical systems over the integers, dynamical models can be represented as connectionist networks, and at least some connectionist netowrks are dynamical systems. However, this is a bit like arguing that there is no difference between classical and relativistic mechanics because both are expressed using differential equations and both can be simulated on a digital computer. Clearly the important issues in that case were which differential equations were proposed, which theoretical entities those equations described, and, most importantly, the fundamentally different conceptions of space.

This, too, is backward. The way I understand the difference between classical and relativistic physics (although, actually, Relativity is generally taken to be a classical theory - it's really Quantum Mechanics that Beer wants here, or else he should have said "Newtonian" instead of "classical"), they appear to be notational variants in all but a handful of salient examples. For the most part, they get the same answers, and for the purposes of everyday existence they are indistinguishable. In fact, Newtonian physics is probably more useful for humans because it is much simpler - easier to work out the equations, etc. We nevertheless conclude that it is wrong because Einstein's theory captures everything that it captures plus alpha. We can devise and carry out experiements in which Newtonian physics makes wrong predictions that Einstein gets right, but we cannot devise and carry out experiments that Einstein gets wrong but the Newtonian tradition gets right. Ergo, we adopt Einstein's theory. An appropriate analogue for dynamical vs. symbolic approaches to cogition would be if these two appeared to be notational variants, but that in fact there were operations that dynamical approaches could describe but symbolic approaches could not. It is not at all clear that this is the case. Rather, it seems to be exactly the opposite. Dynamical and symbolic approaches appear to be radically different, but on closer inspection might turn out to be different instantiations of the same basic idea (that cognition proceeds in time). There are some things that symbolic approaches tend to handle better (parsing) and some things that fare better with dynamical systems (e.g. speech recognition).

I suppose that what Beer was going for was the economy aspect. He's hoping that symbolic approaches will turn out to be shortcuts for dynamical approaches the way that Newton's physics is a shortcut for Einstein's that works in all but a few well-defined (and extreme) cases. He may be right about that. All the same, the level of economy we gain from symbolic approaches makes them unlikely to go away. Further, while Newton's and Einstein's physics both provide explanations that are equally transparent (though perhaps not equally intuitive), symbolic approaches tend to be much more transparent than dynamical approaches. So rather than saying that symbolic approaches are mere shortcuts for dynamical approaches, we will probably end up having to say that they are also clearer and closer to the appropriate level of explanation (the one humans can easily conceptualize).

In any case, Beer's characterization of symbolic-reductionist objections to dynamical systems approaches is surely unfair. It isn't merely that "dynamical models can be given a computational description." It's more than that, in fact, because it deals with a specific hypothesis that certain things which can be given a computational description are also appropriately modeled as computational descriptions. So there's a bit more riding on it than whether or not you can write a program about it! (Contrast this with, say, a weather system. It is possible to model a weather system in a computer, but no one argues that weather systems are symbolic systems. It is only certain kinds of things about which that is asserted. Beer may be right that the Church-Turing and Newell-Simon hypotheses are wrong, but it is clear that he is mischaracterizing their adherents' objections to dynamical systems by invoking the fact that both Einstein and Newton wrote their theories in calculus!)

I think if there is real substance to the debate between dynamical systems and symbolic approaches then it rests on the specification of the state table. Namely, symbolic approaches need one and dynamical systems do not. From this perspective, symbolic approaches "cheat" a bit by allowing themselves to write whatever rules are needed to correctly specify the output. Another way of putting it might be that dynamical systems know their level of generalization beforehand; symbolic systems approach full generalization by writing ever more compact systems of rules. Yet another way of putting it is to say that dynamical systems represent the output of mechanical regularity detection (the person implementing this then needs to study the dynamical system to figure out where the regularities are); symbolic systems are hardwired to test specific hypotheses about where the regularities in the system are.

If symbolic and dynamical systems are truly different, then they are different in terms of the characterization above, or else different on a continuum between "maximally uniform" and "maximally transparent." The sections of the paper where Beer was headed in this direciton are very helpful. The section actually devoted to spelling out the differences between symbolic and dynamical approaches, however, seems to rest more on descriptive characterizations which don't really get at the heart of the matter.

In any case, I don't think there's much to be gained by slogging back and forth between which of these (classes of) models is more appropriate to cognition. I stand by my assertion that which models are appropriate depends on what's being modeled and that with certain problems we have reason to believe that they will always be better modeled by one approach or another (e.g. parsing and symbolic models). The paper gives reason to expect that Beer would at least partly agree with that assessment. In the "outstanding questions" section at the end, this question is featured first:

Do dynamical, connectionist or symbolic conceptions of cognition offer the most powerful foundation for cognitive science, or will a hybrid of the three approaches be necessary?

I'm not terribly sure what's meant by "hybrid," but presumably allowing researchers of various problems to choose the approach most appropriate to their individual problems qualifies.

I've probably gotten on a bit of a soapbox here, but if so it's due to a certain amount of frustration with my own field of Linguistics on these questions. There is a general perception among people who like to call themselves "Cognitive Linguists" or "Language Scientists" (rather than just "Linguists") that symbolic approaches dominate the field and are somehow intolerant of non-symbolic approaches. In my own experience, just the opposite is true. Syntacticians have no problems whatever with phoneticians using calculus to describe their theories, and yet for some reason some phoneticians seem offended that syntacticians describe syntactic phenomena in terms of rules. Indeed, they often like to say that the competence/performance divide is just a convenient way for syntacticians to shove phenomena they don't like under the rug, which is a patently absurd position for someone studying exclusively performance-related phenomena (like a phonetician) to take! (The fact that there are phoneticians at all and that they are considered linguists is proof that the field as a whole is not ignoring these phenomena.) The competence/performance divide is a simple division of labor and nothing more; nothing about it suggests in any way that both branches are not necessary to a full account of language. It is, in fact, a quite natural division of labor given that the systems produced by that division are best described using very different kinds of representations.

The Only Winning Move

Sunday, October 08, 2006

The Dynamics of Pointless Arguments

0 Comments:

About Me

Previous Posts