Friday, April 25, 2008

More Pronoun Rankings

In response to my response to his post on pronouns, Mr. Tweedy has more to say on the subject. Specifically, he acknowledges, based on my evidence, that the underlying problem of pronoun ordering in conjunctive phrases is complex - though of course the point of his original post was just some random observations about certain limited colloquial forms, so no harm no foul.

To get at the solution, he (helpfully) goes through all the combinations of pronouns joined by "and" in subject position - marking each on a well-defined scale of 1 to 5, where 1 is (roughly) "most acceptable" and 5 is "not at all acceptable." (Actually, it's not quite a full factorial typology: he leaves out cases like "I and me" where forms matching in person and number are differentiated by case alone. This is understandable, since discourse situations that involve these forms are rare. If I were going to do a real study on this, I'm not sure whether I would choose to include them or not. Probably I would, for completeness, but there's no doubt they introduce a confound.)

Based on this, he concludes that

It looks like, if you're only going to inflect one, then it's better to put the nominative inflection on the first (and so structurally nearest, if you're assuming a conjunctive phrase joining the second to the first) item and leave it off the second than it is to do it the other way around.

He goes on to add that nominative marking on the first item is optional in colloquial speech, and that nominative can apaprently spread to the second item in a conjunctive phrase, but that it's generally not a good idea to mark the second item nominative if the first item isn't so marked.

But he acknowledges that he still lacks an explanation for why things like "*I and Frank went to the movies" are so terrible. After all, this has a nominative item structurally preceeding an item that's ambiguous for case.

So there are still some pieces missing - and in fact I think I have an idea what they are.

Taking Mr. Tweedy's ratings (though, I hasten to add that I definitely disagree with a number of them - for example (j1) "We and they went to the movies" is REALLY BAD for me, but it gets a clean bill of health from Mr. Tweedy) as input, I wrote a short program in Python to organize the data around features. Specifically, it takes all the input cases and scores a "violation" for the particular ordering based on the rating Mr. Tweedy gave it. So, for example, for "I and him went to the movies" (Mr. Tweedy's item b6), which Mr. Tweedy gives a 5 (I agree), the program adds 5 to the total violation score for 1 >> 3 and n >> o and s >> s- where 1 and 3 are obviously person markers, "n" and "o" are "nominative" and "objective" respectively and "s >> s" just means that a singular form preceeded a singular form. "You" was obviously a bit tricky to handle this way - but since this is just a sweep for fun over a single "subject's" grammaticality judgements, I simply encoded it as "b" (for "both") for singular/plural and nominative/objective. Some more thought is probably warranted there.

In any case, these are the results (where high scores are worse - like in golf).

For singular vs. plural:

b >> s: 7
s >> b: 9
b >> p: 12
p >> b: 14

p >> p: 22
s >> s: 35
s >> p: 36
p >> s: 38

So it doesn't look like there's much of an effect for singular vs. plural. It seems to be better to put singulars before plurals, and "you" before either (maybe?), but the effect isn't very strong. The only thing of interest here is that singular seems to be bad in general. I mean, if you have it paired with a plural, then better to put it first, but best of all is don't have it. So, proposed OT constraints (because I do think this problem warrants a constraint-ranking solution, and NOT a derivational/rule-ordered solution) would be:


  • *SINGULAR (no idea how this one actually functions, of course!)

For nominative vs. objective:

b >> n: 8
n >> b: 10
b >> o: 11
o >> b: 13

n >> n: 22
o >> o: 25
n >> o: 33
o >> n: 51

So Mr. Tweedy's speculation that the first element should be nominative is CORRECT. However, it seems that better still is just to have both in the same case. So right, nominative before objective, but there is also a case concord preference, and it seems to override the "nominative first" rule. Better to have both items in objective case than the first in nominative and the second in objective. (Note: I'll bet, however, that there's an exception to this → don't have two of the same form one after the other - i.e. no "he and he went...") You was again problematic. It seems best to put "you" before items of either case (stress on "seems;" there's no real way to know what's going on with "you" and case).



But (and this is the piece of the puzzle that Mr. Tweedy seems to be missing in his most recent analysis) the most striking result had to do with person.

2 >> 3: 7
3 >> 2: 7
2 >> 1: 12
1 >> 2: 16
3 >> 1: 41
1 >> 3: 58

This seems to show the clearest preferences. 2nd and 3rd person don't seem to care which order they come in relative to each other. But both 2nd and 3rd person like to come before 1st person, and this is especially so for 3-1 pairings. If you simply add up the totals for each in the first position, then 2nd scores a 19, 3rd scores a 48, and 1st scores a whopping 74. Of course, if we do it the other way round and look at second position, 2nd has a 23, 3rd a 65, and 1st a 53, so one could make a case the other way (i.e. 3rd strongly dislikes being last, 1st too, and 2nd doesn't much mind). But there are reasons to think that the ranking is, in fact, 2 >> 3 >> 1. This is because, taken as an ordering on pairs, 2 is definitely supposed to come first. It fares better than either 1 or 3 on individual pairings. If you think of it then as a problem of ordering the other two, having established the ranking for 2, then 3 is clearly supposed to precede 1.

And that, in short, is the answer to why "I and Frank went to the movies" is so bad, even though Mr. Tweedy is right that nominative should precede objective (keeping in mind, of course, that we don't really know what case "Frank" is in - the point is just that there shouldn't in general be a problem with nominative forms coming first). Because there's another constraint that says "don't put 1st person in front of 3rd person." This equally explains cases like "He and I went to the movies" being acceptable whereas "I and he went to the movies" is bad, even though both are nominative-marked. It's also, presumably, why I would have given "We and they went to the movies" a 5 (though I admit I'm puzzled why "They and we went to the movies" is so terrible - a 4 on my personal scale). I suspect Mr. Tweedy actually agrees. Marking that sentence 1 was no doubt a case of hypercorrection on his part (after all, these constraints are based on HIS judgments).

I'm willing to bet, in fact, that the constraint on person is more important than the constraint on case. But I'm hitting my lazy zone, so I'm going to leave it aside for now.

A couple of things to keep in mind with all this. First, these ratings really only reflect Mr. Tweedy's personal grammar. Since he is a native English speaker, it's highly likely that it agrees in large part with everyone else's grammars - but of course the only way to tell for sure is to survey a bunch of people. That is, in fact, something that I would like to do eventually - but not just now. Second, this is only the tip of the iceberg. There are other conjunctions besides "and," for example, and I wonder whether it's different for different conjunctions? Third, no care was taken here to balance out the typology. To do this kind of survey for real, I would have to make sure that each form showed up in the test the same number of times - which is obviously a headache when you're dealing with hugely ambiguous forms like "you."

But it's nothing if not interesting. I declare the problem solved to my personal satisfaction. Nominative should precede objective, 2nd person should precede third person which precedes first, and the person ordering is more important than the case ordering, and the ban on case contours explains cases like "He and me went to the movies" being worse than "He and I went to the movies." The rock in my craw is "They and we went to the movies," which is just BAD. Better than "We and they...," but honestly not by much. However, I think there's an obvious explanation for this one that has nothing to do with the constraint ranking: simple semantics. "We and they" implies "we," and so most people no doubt choose to say "we" (or possibly "we all") in these cases. The constraint here is just *POINTLESS CONSTRUCTIONS.