The most philosophically interesting thing about Combinatory Categorial Grammar (CCG) is that permits the use of functional composition (FC), not merely functional application (FA), in the computation of the meanings of complex constituents. Abstracting away from most of the details, here’s the lowdown on FC and FA in CCG, borrowed from Mark Steedman’s “Combinatory Grammars and Parasitic Gaps” (1987).

Functional application:

1. X/Y:f   Y:y   =>  X:f(y)

In English, a node of syntactic type X/Y — i.e., a functional node with semantic value F looking rightward (this is what the backslash means) for arguments of syntactic type Y and yielding things of syntactic type X as outputs — combines with a node of type Y — i.e., an argument node with semantic value y — to yield something of syntactic type X, whose semantic value is the result of applying F to y.

2. Y:y   X\Y:f   =>  X:f(y)

Functional composition: (only two of the four FC rules are shown here)

1. X/Y:f   Y/Z:g   =>  X/Z:λz.f[g(z)]

2. Y\Z:g   X\Y:f   =>  X\Z:λz.f[g(z)]

I won’t go into the benefits of adding functional composition to our computational repertoire, though there are a bunch. Instead I want to raise a question about it by looking at a derivation for the following sentence (note: ‘VP’ is sometimes used as an abbreviation for the more cumbersome ‘S\NP’).

(1) [NP Marcel] [S\NP [(S\NP)/VP might] [VP [VP/NP prove] [NP completeness] ]

The important point is that there are two ways to do the derivation in the CCG framework. The first uses just the functional application rules, as follows: (1) prove and completeness combine by FA to yield a VP node; (2) the VP node combines with might to yield another VP node; this last VP node combines with Marcel to yield a sentence.

The second uses the first FC rule, as follows: (1) might and prove combine by FC to yield a node of type (S\NP)/NP (i.e., a transitive verb node) with semantic value λxλy.might(prove x)y; (2) the result of step (1) combines with completeness to yield a VP node, with semantic value λy.might(prove completeness)y; (3) this combines with Marcel as above.

My question’s pretty simple: which way is the right way? My gut tells me that a proponent of CCG should have some kind of answer to this question. If she doesn’t, it’s a strike (though not necessarily a major one) against the theory. Other things equal, it’s preferable to have a syntactic theory where the correct computation for some tokened piece of syntax isn’t overdetermined by the theory.

Simon rejects the question, arguing that both ways are right. Just as there are a number of methods you can use to correctly compute 1 + 1 + 3 (since addition is associative and commutative), there are a number of ways to compute Marcel might prove completeness.

But obviously more needs to be said. The ways of computing 1 + 1 + 3 are all, in an intuitive sense, equally good, so long as they’ve got sufficient generality. But CCG doesn’t aspire to be as good as traditional transformational grammars that only have access to the FA rules. It claims to be better, in large part because it is able to do away with such inherently suspect notions as LF, unrealized syntactic constituents (esp. variables), and so on. [-- EDIT: Apparently I should spell this out a bit more. Proponents of CCG say that in the cases of certain sentences -- not (1), but rather (and especially) sentences containing relative clauses, coordinative conjunction, etc. -- the right computation will not be a FA computation. This is the source of a crucial disanalogy between arithmetic and CCG. To take a concrete instance, CCG says the right computation in the case of a sentence like (2) is the FC computation.

(2) Sharon will and Marcel might prove completeness

According to the proponents of CCG, the FC computation is right because it doesn't require us to posit unrealized variable traces and the like at the level of LF. This prompts further questions about the nature of the theoretical advantage associated with the elimination of LF, which the rest of the post is devoted to fleshing out. These questions simply do not arise with the different methods of computing 1+1+3 -- since no one seriously contends that any way of computing this sum has any theoretical advantages over any other -- which is why the worries raised in the rest of the body of the post do not push us toward saying anything analogous about arithmetic. --]

At this point, I think the proponent of CCG faces a dilemma. Either the fact that CCG is able to do away with these suspect notions is a reason to accept the theory, or it isn’t. If it isn’t, then CCG’s stated advantage over competitor syntactic frameworks evaporates. (Ok, not really much of a dilemma.)

But if it is, then we need to be clearer about what exactly this advantage consists in. Why precisely is it good to have a close formal parallel between the metalanguage of syntactic theory and the object language phenomena that the theory is supposed to be about?

Here’s a response that’s clearly a non-starter: syntactic theory has no ontological commitments. It doesn’t claim to model the form of anything in the world. But if this were the case, then how could you construe the elimination of LF and unrealized constituents as an advantage of CCG? CCG might be less cumbersome (in virtue of the close formal parallels between it and the object language), which would perhaps be of some use in building better parsers, but this would be a practical advantage of CCG, not a theoretical one. In order to construe the elimination of LF as a theoretical advantage of CCG, I think you need to view CCG as being, in some sense, ontologically committed.

You might say — indeed, I think Simon is inclined to say — that the object language is an abstract, non-mental object that as a matter of fact has a certain form, and it is the goal of syntactic theory to model that form by studying the apparent form of its tokens. On this response, CCG is preferable to other grammars because, other things equal, it’s more plausible to suppose that the form of the abstract object language resembles the surface form of its tokens than it is to suppose otherwise. I can’t make any sense of this though, since I don’t know what it would be for an abstract object to have a form in the first place. But even if this were sensible, it’s not clear how we could ever decide which method of computing (1) is correct. If both methods were computationally adequate, how could we decide between them? Since there is no direct way of studying the form of the abstract object language, it is difficult to imagine a procedure for obtaining evidence in either direction.

So I think that response is out. Another possible response — the best one, I think — is that the subject matter of syntactic theory is mental in character. (Sociological aside: a number of people — Chomsky definitely, Fodor et al. probably — appear to believe that this is the case.) Syntactic theory uses a static formal language to, in some sense, represent the dynamic parsing of language that occurs in the brain. If this is the right way of viewing syntax, then it’s clear why CCG has a theoretical advantage over other grammars. It’s obviously simpler to suppose that the brain’s parser interprets surface form directly than to suppose that it first transforms surface form into LF and then does interpretation. It’s also clear how we decide which method of computation for (1) is the correct one. We look at tokenings of (1) in ordinary discourse and try to discern which parsing protocol English speakers tend to deploy in parsing (1).

However, if this is the right way of understanding syntax, then CCG turns out, in one respect, to be more ontologically committed than competitor grammars, since it presupposes that there are neural correlates of two kinds of parsing protocols and that these neural correlates form two distinct neural natural kinds. This is a significant commitment — one that proponents of the mentalistic construal of CCG shouldn’t take lightly.


  1. i’ll write a more substantive response on my blog, but for now, let me note a few problems with your post.

    “The important point is that there are two ways to do the derivation in the CCG framework.”

    this is incorrect. given that you have type-lifting in CCG, there are actually infinitely many ways to do a given derivation in CCG. you’re either wrong or in violation of a gricean maxim here.

    “But obviously more needs to be said. The ways of computing 1 + 1 + 3 are all, in an intuitive sense, equally good, so long as they’ve got sufficient generality. But CCG doesn’t aspire to be as good as traditional transformational grammars that only have access to the FA rules.”

    this is confused. you’re talking about two things without realizing it. the analogy is that, just as the methods of computing 1 + 1 + 3 are equally good, the alternate derivations in CCG are, on the theory’s terms equally good. this is not a discussion about CCG relative to transformational grammar. it’s about CCG relative to itself.

    the two hypotheses you describe aren’t incompatible. it’s clear, for instance, that math is an abstract formal system but one that has neural correlates, since, of course, we do math. english, similarly, can be understood as an abstract formalism with neural correlates. this is NOT to say, however, that any current syntactic theory claims direct neural correlates (only bad and irresponsible literature would do that at this point). in the same way that two theories which look nothing alike can, in principle, both be descriptively adequate, the working syntactic theory could look nothing like the formal system that people actually “know.”

  2. nate charlow

    You may understand “two ways” in the standard way, as meaning “(at least) two ways”.

    As for my confusion, I’ve added a clarification to the body of the post which should remedy things. I was misleading in the post, since I might have implied that the “confused” passage was intended as a rebuttal of your claim that every method of computing (1) is equally good. It wasn’t, as the clarification should make clear. In any case, a rebuttal is not really necessary; the point about theoretical overdetermination is sufficient in itself. Other things equal, a theory that posits redundant explanations for a single phenomenon (with no way of telling which explanation obtains on a given occasion of the phenomenon) is to be rejected over a theory that does not. (I, of course, agree that on CCG’s terms the alternate derivations are equally good. This is precisely what I’m worried about!)

    As for your last comment, I don’t think you’re engaging the content of my post. The ontology of math is clear: math is obviously about abstracta (numbers, etc.). The ontology of syntax is less clear. If it’s about abstracta, then we run into problems (which I describe above). If it’s about concreta, then we run into other (less severe, I think) problems. What kinds of concreta is syntax about? Neural concreta? If so, CCG turns out to be pretty heavily ontologically committed — much more so than standard transformational grammar. This is a substantial cost, especially given the rudimentary state of our knowledge of human language processing.

    I agree that responsible syntactic theory will not currently claim direct neural correlates. This is totally irrelevant to the question of what syntactic theory is about, however. If the eventual goal of syntactic theory is to come up with a formalism that has direct neural correlates, then syntactic theory is about neural concreta, and the ultimate way we will assess competing syntactic theories (that are computational peers) is by assessing how well they link up with neuroscience.

  3. Lance

    I need to give this some more thought, preferably at a time that isn’t late on a Friday night. But one thing that comes to mind is this: Quantifier Raising in a sentence with more than one quantifier also allows an infinite number of ways to derive the same meaning. That is, for “every boy read some book”, you’d have…

    [Every boy]x [some book]y x read y.
    [Every boy]a [some book]b [a]x [b]y x read y.

    …and so on. Which is to say, raise “some book” and then raise “every boy” over it; then raise “some book” again and raise “every boy” over it; then…and so on.

    Or for that matter, when you take the sentence “every boy read every book”, you get either

    [Every boy]x [every book]y x read y.
    [Every book]y [every boy]x x read y.

    since you can raise either quantifier over the other, and you end up with the same meaing.

    Danny Fox offers an economy-based account of why this doesn’t happen in practical terms, but of course the point is that an account is required. I dare say that nearly any theory of syntax and semantics will end up giving two derivations that result in identical readings for some sentences, e.g. those with more than one quantifier. So I don’t think the fact that CCG does so, albeit for a greater number of sentences, is all that problematic.

  4. Thanks for your comment, Lance.

    I’m not sure how relevant your point is, since there’s pretty clearly a principled reason for building into our theory a preference for the least amount of “fiddling” that is possible (though this raises further complications, as I argued in the post). So we seem to have a principled reason for saying which method of computing “every boy read some book” is the default/correct one (given, of course, a certain reading of this sentence) — ceteris paribus, it’s the one that involves the least fiddling with the surface structure of the sentence. (I think this — or something near enough to it — also supplies a fairly principled reason for avoiding arbitrary type-raising, which Simon mentioned earlier.)

    The worry about my (1) is that there’s no principled (i.e. non-ad hoc) reason for preferring one mode of computation to the other. This is pretty clearly an undesirable feature for a theory to have, though, I stress, it’s not necessarily a huge worry.

  5. if you allow a “least amount of fiddling” principle, all you need to do is assume that FA falls in with the least amount of fiddling category of syntactic/semantic operations, and your worry is squelched (though i’m working on a post that shows that your worry is ridiculous). to do so is not ad hoc, and there’s been a lot of work in semantics and ccg which bears this out (partee’s type shifting work, in particular). because to get the FC derivation you need to lift the NP “marcel.” there might be (and indeed, as partee proposes IS) a principle that material gets computed with its base semantic type and semantic category as a default. basically, if there’s no reason to lift, then don’t. to state this is no more ad hoc than to prohbit any inordinate amount of “fiddling.”

  6. Why do you need to lift “Marcel” to compute (1) using the FC rule?

  7. (Clarification after speaking with Simon: you don’t need to lift “Marcel” to compute (1). So my point stands, as far as I can tell.)

  8. yeah i wasn’t reading closely.

    to sketch something out for you, however, there *is* plenty non-ad-hoc reason to prefer FA to FC (a point that’s made in the literature, i believe). here’s why:

    FA: λaλb[a(b)]
    FC: λaλbλc[a(b(c))] (this is the geach operator)

    FC is clearly a higher typed, perhaps less natural operation than FA. now, i still think it’s completely irresponsible to tell processing stories, but this one seems like it should fit the bill for what you’re asking, i.e., it gives a principled reason for preferring FA to FC. in a ccg.

    this is not, however, to say that FC is undesirable or that strict FA should be taken to be a desideratum of our theory (did i use that jargon right?). there’s several reasons for this. first and most important is that FC buys you things that FA + variables doesn’t (namely ATB binding cases, i.e. every man loves and no man marries his mother–in a variable-full semantics, there’s no reason why his mother shouldn’t be able to simultaneously have both bound and unbound readings). furthermore, it’s not clear to me that theory-internal ranking of combinators is the same as saying one is more or less desirable than another. in any case, if we’re optimizing our theory, it seems that not having variables, abstract LF, assignment functions, etc., and getting ATB binding to work properly should be more important than whether or not we have the extra piece of function composition (as jacobson notes again and again, geaching/FC is EXTRA.. nothing is being swept under the rug).

    finally, an appeal to intuition might be in order. why would it make sense for a brain to have evolved function composition at all? well, for starters, to deal with CLIFFHANGERS:

    the baby’s father is……

    FC allows you to shout “WHO!?”

  9. springboarding off lance, there’s plenty of examples in contemporary “tranny” syntax of redundancy. for example, in tranny syntax you have the derived VP rule.

    “john loves his mother”

    there are two ways of computing this in standard transformational grammar.

    the first involves accidental coreference. the second uses some method of binding, i.e, the derived VP rule (see: partee).

    this redundancy is present in ccg as well, because any system that affords you the benefit of a binding combinator will give you a means to generate two derivations for sentences where the “binder” is an NP (e) type.

    this seems to be a charasteristic of natural language rather than a deficiency in any formal modeling thereof.

    (so drunk)

  10. Lance

    I suppose my intended point above (he said, coming back from vacation rather suddenly) was that once you acknowledge that there are principled reasons to avoid fiddling, the question about (1) becomes not “why is there no principled reason to prefer one derivation over the other” but “what is the principled reason?”. That is, simply stating that there isn’t a principled reason–or, more properly in your comment above, stating a worry that there isn’t one–isn’t grounds to reject the theory for that shortcoming.

    It occurs to me, as well, that there may be levels of ontological commitment. That’s probably not quite what I meant to say, but perhaps look at it this way: the goal of syntax is not [yet] to model the neurological processes of the mind, but to build a theory that explains, as simply as possible, the natural phenomenon of language. Compare this to theories of orbit: though we can’t really observe orbits directly, we can observe the planets’ movement relative to us, and postulate one of two things: (1) planets revolve around the earth, with lots of retrograde motion; (2) planets revolve around the sun. Both of them might explain the data, but we prefer (2) because it’s a cleaner explanation without as much stuff cluttering up the mechanism.

    So “less cumbersome” may be somewhat more valid a reason to prefer a theory than you suggest–or at least, the point is that the commitment is to modeling and not, perhaps, literal mental computation.

    (Then again, I’ve been up for a little too long at this point, so that may not have been as coherent as I hoped. Apologies for reviving the discussion a week later.)

  11. “I suppose my intended point above (he said, coming back from vacation rather suddenly) was that once you acknowledge that there are principled reasons to avoid fiddling, the question about (1) becomes not “why is there no principled reason to prefer one derivation over the other” but “what is the principled reason?”. That is, simply stating that there isn’t a principled reason–or, more properly in your comment above, stating a worry that there isn’t one–isn’t grounds to reject the theory for that shortcoming.”

    I totally agree, and this is more or less the stance I took in the post. I believe it’s possible to answer the question “what is the principled reason [to avoid fiddling]?”, but that doing so involves being explicit about the ontology of the theory. Once we’re explicit, it’s also possible to imagine why one CCG derivation would be preferable to the other. The downside is that it will likely turn out that CCG is more heavily ontologically committed than TG.

    I’m not sure what your example shows. Theories of orbit are clearly about the actual movements of the planets. The “heliocentric” theory is preferable since it’s simpler to suppose that the actual movements of the planets don’t involve epicycles. Simple explanations appear more likely to be true (i.e. to correspond to things as they actually are) than complex explanations. This is why simplicity is a theoretical, as opposed to merely practical (i.e. pragmatically justified), virtue.

    If the simplicity of CCG is a theoretical virtue, then it is a reason to think the theory more likely to be true than its competitors. If the theory is actually aiming at truth, it ought to be explicit about its ontology. If the ontology is mental, then CCG appears to be committed to the actual existence of neural correlates of its computational rules. It doesn’t matter whether or not the exponents of CCG downplay this commitment, or whether, as Simon says, no one really thinks that any part of our syntactic theory has direct neural correlates. If the ontology of the theory is neural, then the ultimate criterion for evaluating the theory will be how well it corresponds to the neural.

  12. I’m glad you finally see the light.

    You’re right that my phrasing was sloppy. I meant that CCG is more heavily committed along one dimension (the how-many-mental-computational-rules-do-you-posit dimension). On balance it’s pretty clearly the less committed theory.

  13. i agree with all of what you wrote, except for “The downside is that it will likely turn out that CCG is more heavily ontologically committed than TG.” this is only true if you’re taking a very narrow view of the respective theories. there is much more additional apparatus in TG than CCG.. that’s sort of the point. and, as noted above, TG runs into redundancy as well.

  14. simon

    im not sure it’s fair to characterize my position as a conversion. i still maintain that ccg doesn’t commit itself to neural correlates, but i would argue that any “final” theory should do so. insofar as this is the case, ccg is a simpler (and more likely) final theory than TG. this is not, however, to say that ccg has made any ontological claims. as far as i know, this has been my point all along.

  1. 1 some-antics.com » Blog Archive » Neural correlates, syntax, the whole shebang

    [...] Charlow has an interesting post over at his blog about CCG and the ontology of syntax, where he voices some concerns about the redundancy of different operators (function application [...]

  2. 2 question about anaphora in VFS « de crapulas edormiendo

    [...] are two ways of doing the derivation in VFS (which shouldn’t really surprise us). I’m not sure they yield equivalent results, however, at least assuming something like the [...]

  3. 3 testing grammars « de crapulas edormiendo

    [...] 2007 in Syntax, Language, Philosophy So there are two big worries with CCG, as I see it. 1.  CCG sanctions multiple equivalent-in-result [...]




Leave a Comment