I’m probably the wrong person to listen to when it comes to anything about that extension of abstract algebra we call category theory. I’ve been trying to learn about it for literally decades, and much of that time has involved an uncomfortable feeling that something fundamental about the material is just bouncing off of me. Contrast this with many of my colleagues who find great comfort in conceiving of mathematical concepts in the terminology of this area, and feel unsure whether they understand some concept before it has been rephrased in category-theoretic terms. Unfortunately these explanations (which scroll past me on social media and group conversations and Q&A at seminars, and...and...) don’t seem to be at the right level or angle to scaffold me up further into the topic. I experience this regularly and in some ways I am intensely jealous. Something about this doesn’t click for me: many things don’t click.
On the other hand, this feeling of being lost has been my friend and companion for many years about many topics, some of which I feel far more comfortable with now, so I continue to seek the right metaphor, the right story, that resonates with me and helps me continue to make forward progress, and teach concepts in a way that seems to work for others besides myself. Hence the story you are about to embark on.
I understand enough to know that category theory is useful as an organizing principle, and I must admit that those aspects of it that have breached my blood-brain barrier have been useful for conceptualizing in the abstract. When explaining mathematical concepts, you will not see me drawing commuting diagrams to explain concepts (except when they are literally rudimentary category-theoretic concepts), but you will see me verbally or formulaically appeal to categorical concepts like “universal properties”, albeit in more concrete terms than that. So I persist, years later, in making progress, and perhaps some of my recent sense of progress will help someone else.
Given the above, take the below with the appropriate grains of the appropriate salts!
My Problem
During one of my bouts of “let’s try to learn some category theory again”, I was poring over Eilenberg and Mac Lane’s General Theory Of Natural Equivalences, the classic paper that makes the case for natural equivalence as an important and hitherto invisible concept. Along the way, the paper introduces natural transformations, which are the basis for defining natural equivalences.
The paper starts with an observation about vector spaces and constructions on vector spaces: schematic methods for producing one kind of vector space from another. They observe some peculiar and compelling relationships between not the spaces themselves, but rather the constructions themselves.
A discussion of the “simultaneous” or “natural” character of the isomorphism \(L \simeq T(T(L))\) clearly involves a simultaneous consideration of all spaces L and all transformations \(\lambda\) connecting them; this entails a simultaneous consideration of the conjugate spaces \(T(L)\) and the induced transformations \(T(\lambda)\) connecting them [emphasis mine]. 1
The relationship is captured in the form of schematic relationships that appeal to algebraic structure-preserving functions among the constructed spaces. I was unfamiliar with the concepts, but I could at least see how the authors recognized an interesting patterns of relationship among kinds of vector spaces, how in certain circumstances those relationships were highly constrained, forced to satisfy a kind of uniformity across pairs of vector spaces related to one another by this notion of “constructions”. They argue that such constrained relationships should be recognized, and show that fundamental to this recognition is how they can be represented using the language of vector space algebra, albeit clumsily and somewhat implicitly. This relationship goes beyond vector spaces: it arises among many varieties of mathematical concepts. But it seems to be this specific application that motivated the idea of “naturality”, whose formalization called for the definition of functors which in turn required the definition of categories. 2 As I understand it, capturing naturality was the focus, and the rest were necessary components. I have seen Mac Lane rumoured to have said as much, but have yet to track down the source.
So imagine my frustration that naturality, the most important of these concepts, is the one I understand the least. What is being transformed? How exactly is that transformation “natural”? I can see, by example, that there is a pattern to the relationships that are represented by natural transformations and natural equivalences, and by the continued effectiveness of its use that there is something worth articulating here. But I have never felt a visceral grasp for naturality.
Often this naturality is explained as...well natural (pun intended), and that you need to see many examples to get a feel for it. In programming languages theory, my research area, many folks will say to me that “natural transformations are like parametric polymorphic functions.” Indeed, parametric polymorphic functions (don’t worry if you don’t know what these are) are an important concept in PL, and they can be modeled as natural transformations. So this is an important exemplar. But this is still like telling me “A tree is like a Douglas Fir”: the example is the cue to understand the concept, but in an essential way this feels backwards and unfortunate. Indeed many concepts from nature yield messy taxonomies that can only be understood in terms of history and the web of instances, but we’re talking about pure mathematics here!
Another common effort to help explain natural transformations relates them to the concept of homotopy in algebraic topology. When I sat in on Steve Awodey’s (awesome) category theory class in my post-doc days, he too followed the introduction of the standard definition with the homotopy-looking characterization. Having previously learned about homotopies, I found this definition somewhat more comforting than the original, and working through its implications was quite satisfying (and heavily influenced what follows). I could get a sense of “continuous deformation-ishness” from this characterization, but I still could not see/feel the the full force of “simultanaeity” hidden in the definition.
I’ve wanted a story that I could find more compelling, and couched in terms of the abstraction itself, rather than PL theory or topology. Alongside good examples, I hope for this to help me better understand and explain naturality, rather than simply get used to it.
I think I finally found such a story, one that tickles my happy place. But it forced me to do some violence to the common presentation of categories, functors, and natural transformations. The result is an equivalent formalization of the concepts that better suits my present mental model. This kind of move is rather audacious, and quite frankly may lead to dead ends or problematic views that ultimately cause more conceptual harm than clarity. On the other hand, another perspective may draw out new or different connections, so I offer it for your chewing enjoyment.
So what follows is my own current personal answer to the title: “what is so natural about natural transformations?”. The following will be quite abstract, because it focuses simply on the abstract definitions and abstract properties of category-theoretic concepts. Perhaps in revision I’ll add examples, because that may help explain how I connect the concrete problems addressed by this abstraction to the abstraction itself. So this won’t make much sense if you don’t already have some background in the elementary basics of categories.
What’s a Category?
Despite the importance of natural transformations, the order in which the concepts are taught tends to follow the dependency order: categories then functors then natural transformations. The early concepts are needed to define the later ones. So let’s start with a definition of “category” and go from there. As you’ll see, my definition of a category is probably not yours, but it’s equivalent. Furthermore, at its core, it’s not actually mine: Saunders Mac Lane gives it as a brief aside in his famous book Category Theory for the Working Mathematician. But you’ll see that I’ve changed the terminology and notation for my own personal story-telling purposes: I find that the nuances of informal words used formally heavily influences my thinking about formalities (weak Sapir-Whorf anyone?), so rethinking the words we use for concepts helps me internalize them. 3 I apologize for the confusion this will inevitably cause to some readers. Just remember: my reason for going into a different definition of categories from the norm is that it leads quite directly to my answer to the question posed up-front. In fact, following this thread is how I came to discover this answer. Ok, buckle up!
Informally speaking, a category is:
a collection of “stuff”;
one binary relation among stuff; and
three operators on stuff.
I will call an item of stuff an element: a “real” category theorist would call them “arrows” or “morphisms”. I’m not using those names because, those words don’t resonate with me or my inner story about what categories are, though I can see how they served Eilenberg and Mac Lane’s original examples well. History has demonstrated how much more widely these concepts can be applied, so I’m going for that level of generality
Our one (binary) relation among elements is identity. Given two elements, or more technically expressions that describe elements, they are either identical, meaning they both describe the same element or not, i.e. \(a = b\) or \(a \neq b\). The significance of identity is that given any proposition \(\Phi\) in the language of category theory, if \(a = b\) then you can replace every (free) instance of \(a\) with \(b\) because the two expressions are simply different names for the same element. This is kind of like aliasing when programming with pointers in an imperative language.
Now for the operations. All three operations take elements as inputs and (maybe) produce elements as output. The first two operations are unary, and total. Associated with every element \(a\) is a unique (in the sense of identity) left unit element \(L(a)\) and a unique right unit element \(R(a)\) (or “left element” and “right element” for short). Since these elements are unique, we represent leftness and rightness with operators rather than relations. For reference, actual category theorists would call \(L(a)\) the domain of \(a\), and \(R(a)\) the codomain of \(a\), and the domain and codomain would not be elements...err...arrows, but instead some other distinct kind of stuff called “objects”. As you’ll see, I’ve banned other stuff from my island. You can insert the dog-with-frisbee meme here: Elements?? No Objects!! ONLY ELEMENTS.
Elements, identity, and left and right elements suffice to introduce some of the properties required of \(R(a)\) and \(L(a)\). First, if element \(a\) is the left element \(L(b)\) or right element \(R(b)\) of some element \(b\), then \(L(a) = a\) and \(R(a) = a\). It’s worth noting that \(b\) could actually be \(a\), so if \(L(a) = a\) then we can deduce \(R(a) = a\) as well. That means, for example, that \[{L(R(L(L(R(L(R(R(a)))))))) = R(a)},\] for any \(a\). These “self-adjacent” elements, serve the purpose that objects do in most texts. They are usually called “identity arrows”, but for reasons that unfold below, I call them unit elements.
The third, and last, operation is binary and partial. To every two elements \(a\) and \(b\) such that \(R(a)\) = \(L(b)\), that is, for which the right element of \(a\) is identical to the left element of \(b\), we associate a unique composite element \(C(a,b)\). As-is typical, we sometimes use an infix notation \(a \mathbin{C}b\), especially when chaining compositions together. But I’ll use both prefix and infix notation interchangeably.
This raises an important (to me) conceptual-terminological point. I have strongly mixed feelings about the use of the word “composite” here, but I don’t see the possibility of a better word. In particular, the composite element \(C(a,b)\) is not composed of \(a\) and \(b\), in the sense that there are not little \(a\)’s and little \(b\)’s hiding inside of \(C(a,b)\). Rather, \(C(a,b)\) is “just” another name for some element of the category, and it’s that name that is composed of names for a and b, not the element itself. As we’ll see, every element can have many names, and the names are the things that have “compositional” structure.
My view of category theory is that it foregrounds a common and critical notion of compositional surrogative reasoning, in the sense that we reason about mathematical objects (or phenomena) by manipulating (textual) stand-ins for those actual objects. The first sense in which this is true is the difference between expressions in the language of category theory, and the elements of the category (the elements and the relationships represented by operators and relations) themselves. We are playing shadow puppets, using a highly expressive “API”, but don’t confuse the API with its implementation!
Thanks to the composition operator, one can reason about properties of some notion (like the element \(C(a,b)\)) in terms of relationships between other notions (like \(a\) and \(b\)), even if there is no literal component-wise structure in the formalism. This is why I emphasize that \(C(a,b)\) is associated with \(a\) and \(b\), not comprised of them. There may be other couples, triples, or quadruples of elements associated with that same element. Nonetheless, our elements are “atomic” in the sense that they have no discernible internal structure. The only direct analysis of elements is whether they are identical or not. So how does category theory connect to concrete mathematics? Well, that’s a story for another day, but I view categories much like I view relational database indexes: they’re a map, not the territory.
Despite everything I said about my not-real-composition perspective on composition, I still find it useful to think of these properties in real-composition terms. For instance, if we visually think of elements as Lego blocks, laid on one side, then the left (right) element can be viewed as describing the connection interface of the left (right) side of an element. Then we can imagine hooking the right of \(a\) to the left of \(b\) as possible when their descriptions are compatible, leaving the left of \(a\) still visible and available, and the right of \(b\) still visible and available. However, one must keep in mind that our notion of composition is not real hooking together of blocks: you can imagine \(C(a,b)\) as being a distinct larger monolithic Lego block that has no seams, but acts as though you had hooked an \(a\) and \(b\) together. So the composite is an element that looks “as if” it had been constructed by linking an \(a\) and \(b\), though one did not. What a strange Lego set, where you never compose, since you already have all the aggregates. This perspective guides how I interpret the properties of \(C\).
The first two properties completely determine the left and right elements of composites: \(L(C(a,b)) = L(a)\) and \(R(C(c,b)) = R(b)\). Given a bag of elements, each of which has known left and right elements, these properties constrain which associated composites are plausible.
We have more relationships that connect left and right elements to composites. The names left and right come from the relationship between these elements and composition, partly because of the above property, but also because of the following ones: \(C(L(a),a) = a\) and \(C(a,R(a)) = a\). So to use terms from algebra \(L(a)\) is a left unit of \(a\) with respect to \(C\), and \(R(a)\) is a right unit of \(a\) with respect to composition. These properties are what motivates the name unit elements because of their status as left or right units of some other element(s) with respect to composition. This further emphasizes the sense in which the element \(L(a)\) describes the left interface of \(a\) (and does nothing more), and \(R(a)\) describes the right interface of \(a\) and nothing more. My image is of an infinitesimally thin Lego block, so thin that its left interface and its right interface can’t help but be described by the same element...namely itself! And hooking it onto the left of something for which it is a left unit makes no perceptible difference, and same for the right. Stepping back for a moment, what fascinates me here is that some elements, namely the unit elements, can be used as both subjects of composition as well as descriptors for all and only those pairs of elements that yield well-formed compositions. I find this parsimony and economy of concepts viscerally satisfying, at least in the abstract.
Finally, the associations among elements that are described by composition carry a strong coherence property, which can be described succinctly, but I will spell it out in a way that better matches my thinking. Suppose three elements, \(a\), \(b\), and \(c\), and suppose that \(R(a) = L(b)\) and also \(R(b) = L(c)\). We assume nothing about \(L(a)\) or \(R(c)\) except that such elements must exist. Now, the above two equalities imply the existence of some composite element \(d = C(a,b)\) associated with \(a\) and \(b\), and some other composite element \(e = C(b,c)\) associated with \(b\) and \(c\). Now, because of the earlier composition properties, \(L(e) = L(C(b,c)) = L(b)\), so there must be an element \(f = C(a,e)\), and since \(R(d) = R(C(a,b)) = R(b)\), there must be an element \(g = C(d,b)\). The property we demand is that \(f = g\): these two distinct orders of composition yield identical associated elements. This property implies that given any sequence of elements \((a, b, c, ... z)\) such that the right element of one is identical to the left element of the subsequent element, then sequentially composing elements according to any strategy of compositions (e.g., first-to-last, last-to-first, middle-out, random) yields a unique associated element \(a \mathbin{C}b \mathbin{C}\cdot \mathbin{C}z\). We could concisely say that composition is “associative”: \(C(C(a,b),c) = C(a,C(b,c))\), but I like to draw this out and dwell on it. It’s a strong coherence property with respect to identity.
All of the above makes a category! Those who are used to category theory have surely noticed a few peculiarities of this definition. First, there’s the different naming to spark different intuitions. Terms like “arrows” or “morphisms”, have a connotation of going from one place to another, or performing a transformation, whereas “elements” have a connotation of simply “being” and having no internal structure. Elements don’t “do” anything, or contain anything: they merely observe external relationships with other elements. Second, we’ve gotten rid of “objects” as classifiers of compositions. Unit elements can do that just fine, and lead to a very different feel.
As mentioned, Mac Lane notes this definition, but goes with the traditional one: 4
Since the objects of a metacategory correspond exactly to its identity arrows, it is technically possible to dispense altogether with the objects and deal only with the arrows (CfTWMp. 9).
Indeed objects give a nice connection to sets and arrows to functions in categories associated with set-theoretic structures. But for understanding category theory in a way that is removed from its set-facing origins, and the mental biases that it brings (especially for an unabashed fan of set theory) I like to suppress those associations, and I feel that objects are a cognitive distractor: what fundamentally matters is the relationships among elements, in light of the structure mandated by the properties of left and right unit elements and composition.
In my mind, category theory is fundamentally about abstract interfaces among abstract elements, and one ought not worry about the idea, or even existence, of truly composite structures (the implementations). Focus on external relationships as the essence of categorical structure, rather than the internal composition of entities via construction of “objects”: that’s what set theory is for!
What’s a Functor?
We have now re-imagined categories as bags of inter-related elements with fundamental composition properties, described entirely in terms of elements. Let’s move on to our first kind of relationship between categories. Suppose we have two categories, \(\mathcal{C}\) and \(\mathcal{D}\). Each is a bag of elements, each has a notion of identity among elements within the category: there is simply no notion of identity among elements in distinct categories! As far as we’re concerned, the question doesn’t even make sense! Instead, we develop relationships among elements of distinct categories by adding more structure to our repertoire. For precision, we index the operations and relations of each category to emphasize that these relations and operations are well-defined only within the relevant category. For example, \(=_{\mathcal{C}}\) is the identity relation of category \(\mathcal{C}\). Relationships among categories are described by additional structure that is constrained by the details of the category-specific relationships.
A functor \(F\) describes a binary association between the elements of two categories. So to every element \(c\) of category \(\mathcal{C}\), functor \(F\) associates an element \(d\) of category \(\mathcal{D}\). Thus you can say that \(F\) has a total functional relation structure...it’s a total function, but only in the sense that a unique \(d\) is associated with each and every \(c\). I do not think of this as a procedure that operates on things, just a relationship.
We demand more from our functors than totality or functionality: a functor must also engage with the structure that the two categories possess: \(=_{\mathcal{C}}, L_{\mathcal{C}}, R_{\mathcal{C}},\) and \(C_{\mathcal{C}}\) for \(\mathcal{C}\), and \(=_{\mathcal{D}}, L_{\mathcal{D}}, R_{\mathcal{D}}\), and \(C_{\mathcal{D}}\) for \(\mathcal{D}\). First, \(F(R_{\mathcal{C}}(c)) =_{\mathcal{D}} R_{\mathcal{D}}(F(c))\) and \(F(L_{\mathcal{C}}(c)) =_{\mathcal{D}} L_{\mathcal{D}}(F(c)):\) left-elements and right-elements are preserved. Thankfully, because of the inherent properties of left-and right elements, we also get that unit elements of \(\mathcal{C}\) must map to unit elements of \(\mathcal{D}\), since \(R_{\mathcal{C}}(c) =_{\mathcal{C}} c\) implies \(R_{\mathcal{D}}(F(c)) =_{\mathcal{D}} F(c)\). The second (and last!) property is that if \({R(c_1) = L(c_2)}\), then \(F(C_{\mathcal{C}}(c_1,c_2)) = C_{\mathcal{D}}(F(c_1),F(c_2))\): associated composite elements are preserved.
From a purely technical standpoint, I much prefer this definition of functors, to standard definition, which requires thinking about mappings of two different kinds of things, and some unpleasant interactions and notational oddities that ensue. If you know, you know!
What’s a Natural Transformation, and What’s So “Natural” About It?
Ok, we’ve got categories, and we’ve got functors: it’s time for the main event! So what is a natural transformation? As with the last sections, I am simply going to give an abstract definition, without concern about its utility, only that it be equivalent to the standard one. Here’s where things get different.
Suppose you have two categories, \(\mathcal{C}\) and \(\mathcal{D}\), and you have two functors, \(F\) and \(G\), both of which describe relationships between \(\mathcal{C}\) and \(\mathcal{D}\), in that they describe structure-preserving mappings from elements of \(c\) to elements of \(d\). These are the core constituents of a natural transformation.
By my definition, a natural transformation is a fundamentally different kind of mapping from elements of \(\mathcal{C}\) to elements of \(\mathcal{D}\), constrained by the two functors. This sounds rather different from the usual presentation as being a relationship between functors, but only in the sense that the importance of the functor relationship is a bit more implicit. Nonetheless the existence of a natural transformation implies something about the two relevant functors, and the two relevant categories.
A natural transformation \(\eta\) subject to \(F\) on the left and \(G\) on the right is a total mapping from elements of \(c\) to elements of \(d\), much like each of \(F\) and \(G\) are, but with different properties. First, if \(c\) is an element of \(\mathcal{C}\) then \(L(\eta(c)) = F(L(c))\) and \(R(\eta(c)) = G(R(c))\): the two functors determine the left and right “interfaces” of the mapping for any given element.
Now here’s the punchline. Suppose \(c\) is an element of \(\mathcal{C}\), and that it can be described by the sequential compositions of \(n\) elements: \[c = c_1 \mathbin{\mathbin{C}_{\mathcal{C}}} \cdots \mathbin{\mathbin{C}_{\mathcal{C}}} c_{i-1} \mathbin{\mathbin{C}_{\mathcal{C}}} c_i \mathbin{\mathbin{C}_{\mathcal{C}}} c_{i+1} \mathbin{\mathbin{C}_{\mathcal{C}}} \cdots \mathbin{\mathbin{C}_{\mathcal{C}}} c_n,\] Then: \[\eta(c) = F(c_1) \mathbin{\mathbin{C}_{\mathcal{D}}} \cdots \mathbin{\mathbin{C}_{\mathcal{D}}} F(c_{i-1}) \mathbin{\mathbin{C}_{\mathcal{D}}} \eta(c_i) \mathbin{\mathbin{C}_{\mathcal{D}}} G(c_{i+1}) \mathbin{\mathbin{C}_{\mathcal{D}}} \cdots \mathbin{\mathbin{C}_{\mathcal{D}}} G(c_n),\] for any element \(c_i\) in the original sequence. This notation means that we can describe \(\eta(c)\) by mapping some element \(c_i\) of the sequence using \(\eta\), “recursively”, and mapping all elements to the left of it in the sequence (if any) using \(F\) and all elements to the right of it in the sequence (if any) using \(G\) and compose the results. The treatment of left and right elements by \(F\), \(\eta\), and \(G\) respectively guarantee that such a composition exists, so it’s the identicality that is the key. This property must hold for all possible composition-based descriptions of all elements of category \(\mathcal{C}\).
This property deserves some pondering to consider how wild and constraining it is! Consider our element \(c_i\). If \(c_i\) is described by another sequence of compositions in \(\mathcal{C}\), then \(\eta(c_i)\) in turn can be described by \(F\)-ing some elements of the sequence, \(\eta\)-ing one element of the sequence, and \(G\)-ing the rest of the elements. And we can replay this again. And again. And again! That is to say, the interaction that \(\eta\)’s mapping has with composition and the two functors is of a “fractal” or “self-similar” nature: we can keep “zooming in” to composition sequences and re-describing the aggregate association of \(\eta\) using a single application of \(\eta\) to “cross” from \(F\)’s actions to \(G\)’s. So this property touches every \(L_{\mathcal{C}}\),\(R_{\mathcal{C}}\), and \(C_{\mathcal{C}}\) relationship of \(\mathcal{C}\). As described here, and depends on the structure of \(\eta\)’s image in \(\mathcal{D}\). These properties are global and pervasive and simultaneous.
The naturality of natural transformations are often described according to a form of “simultaneous local coherence” properties in terms of only the analogue of unit elements (so-called “objects”). With some reasoning, we can show that a natural transformation of the kind I describe above can be fully determined by its treatment of unit elements, using a few “local” relationships about rotating which unit element gets mapped by \(\eta\). That characterization is quite compact, which gives a sense of parsimony. But to me it obscures the global implications of a natural transformation and how global the constraints fundamentally are, how “simultaneous” and “fractal” they are. That description is equivalent to this one, but I think it understates the significance of the simultaneity, by focusing the mind on unit elements, and providing a necessary and sufficient condition that doesn’t quite convey how all elements and their properties are brought to bear, in lock-step on the concept.
Naturality touches every element of the source category, and every description of every element of the source category. I think the path equation in particular underscores the extent to which a natural transformation is like a self-supporting arch: all of the pieces must fit together like clockwork to support the property. There can exist multiple natural transformations subject to two functors, but each one manifests a category-wide equilibrium that, at least intuitively speaking, would be difficult to imagine perturbing. Arches are beautiful and fascinating naturally-occurring phenomena. Their balance defies our intuitions and inspires humans to recreate that beauty in their own inventions. The same seems to be true for natural transformations: their structure was at first discovered, rather than explicitly developed, but now they are devised with full awareness by informed minds.
A reasonable question to ask: is this global formulation of natural transformations useful? To be honest I don’t know. The original motivations for natural transformations (natural equivalence) were fundamentally based on abstracting constructions on sets, and this same facet manifests in discussions of parametric polymorphic functions. Perhaps parametric polymorphic functions can be reconceived as a different form of “liftings-plus-embeddings” applied to arbitrary functions, subject to two (functorial) type constructors, rather than as a “generic function” applied to individual types (i.e., sets of values, independent of all possible mappings from them). This conception might make parametric reasoning properties more explicit. But I haven’t given this much thought (yet, I hope). I can only speculate at this point.
Thanks to Yanze Li for typo fixes!
In this quote, “transformation” refers to structure-preserving mappings between vector spaces, natural transformations.↩︎
Well actually, Eilenberg and Mac Lane have a 1942 article that identifies the concept of naturality, where they focus on group theory. My bad!↩︎
Lucky for me I’m not alone. Here are the statisticians Bruno de Finetti and L. J. Savage on this matter: “Every progress in scientific thought involves struggle against the distortion of outlook generated by the habits of ordinary language, the deficiencies and captiousness of which must be denounced, combatted, and overcome, case by case.”↩︎
Freyd and Scedrov also begin their book Categories, Allegories with an analogous definition of categories, though on a cursory look, they seem to revert to a standard characterization shortly thereafter (thanks to Andrej Bauer for that information!).↩︎