Saturday 13 May 2023

Topology Episode 3: Connectivity and Directed Graphs

\( \newcommand{\touches}{\mathrel{\delta}} \newcommand{\ntouches}{\mathrel{\not\delta}} \newcommand{\setdiff}{\mathop{\backslash}} \newcommand{\U}{\mathcal{U}} \newcommand{\inv}[1]{\U \setdiff #1} \newcommand{\R}{\mathbb{R}} \)

In our last episode, I introduced the idea of topological connectedness and related it to the mean value theorem from real analysis.  It's kind of cool that that theorem does not strictly depend on metric spaces, but rather depends on ideas of touching, in particular connectedness.

It makes a lot of sense that in a standard topology text, topological concepts are often demonstrated in the context of analysis, since historically point-set topology was a way of abstracting analysis, until at some point it became its own discipline.  But it can be nice to see some of these ideas at play in other applications.

This is a good opportunity to treat a structure is commonly used in computer science and that most folks would consider to have "topological structure", but that we do not often describe in the language of point-set topology:  directed graphs.  If you are like me, then you have some intuition that nodes and edges in a directed graph exhibit a notion of "distanceless connectivity", in the sense that we do not necessarily associate a distance metric with each edge, but we do tend to  think that two nodes that have an edge between them are "connected", and that a node that's a minimum of two hops away is also connected, but "further" than a node that's one hop away.   So how do these intuitions play out in topology?  I'll be honest: I started writing this post with intuitions that suggested formal conjectures that turned to be wrong!  I'll try to swallow my pride and point out my errors along the way or in summary.  For me, engaging with the formalism of topology in terms of touching, and reckoning with my intuitions about "the topological structure of directed graphs" taught me a lot about topology itself and what intuitions I have about identifying topological structure in graphs. And boy have I made a lot of mistakes along the way!

Connectivity, Again

First, recall the standard definition of connected space \(\U\): 

\[\begin{multline*} \forall A \subseteq \U.\, (\exists x \in \U.\,x \in A) \wedge (\exists x \in \U.\,x \in \inv{A}) \implies  \\ (\exists a \in A.\, a \touches (\U \setdiff A)) \vee (\exists x \in (\U \setdiff A).\, x \touches A).  \end{multline*} \]

The premise is about splitting the space into two non-empty parts: we don't really care about that here.  We instead care about the consequent, which imposes at least one touching constraints.  Topology books give a more compact definition of connectivity in terms of the impossibility of splitting the space into two disjoint non-empty open sets.  In my view this definition is elegant in expression but obscures the disjunction that naturally appears when you ask what is possible rather than what is impossible.  I find this framing informative.  

And to be honest, I mis-translated it at first, using conjunction rather than disjunction, which demonstrates a shortcoming in my understanding!  The key is that in any split at most one of the sets can be open, which implies that touching need not be "two-way" across the divide. I missed this fundamental asymmetry in connectedness, and it's easy to ignore this distinction in geometry  (the study of "land measurement" where measurement determines proximity), thus real analysis, but one-way connectedness is important in a directed graph, where directed flow, not distance, determines connectivity.  In a space where touching captures graph structure, being a connected component ought not require that all nodes are directly connected to one another.  As it turns out, topological connectedness can account for that.  In my view, seeing how topological connectivity can reflect graph connectivity sparks insights: it can help you more clearly grasp some of the richness (and limitations) of this topological concept, and of topology itself. 

Directed Graph Topologies

To help ground this discussion, let's consider a particular directed graph.  This one is tractable but not totally trivial:

The graph has 4 nodes and 4 edges, features a cycle, and also features a node to which one cannot return by following edges.  For lack of a better name, I'll call it "The Fallen P Graph" because it looks to me like the letter P has stumbled and fallen, but now opts to rest rather than serve its lexical master.  

Now for some topology!  A great resource for topologies on directed graphs is a technical report from 1964 by Thomas J. Ahlborn (I think that this was his MSc. thesis?), which proposes 3 different ways to assign topologies to directed graphs, though focusing primarily on one (which I'll refer to as "preferred" or "default") topology.  It was followed up by a paper from 1968 that proves more results about the preferred 1964 topology.  That Ahlborn could conceive of 3 topologies for directed graphs might hint that topological structure is not a singular thing: you can analyze different notions of proximity present in a single class of objects (in this case directed graphs).

I will start by recalling Ahlborn's first topological space, defined in terms of open sets, and then transform the definition into one phrased in terms of touching.  The strategy I use to transform a topological space defined in terms of open sets (the norm) into a topological space defined in terms of touching involves the following three observations:
  1. A set \(A\) is open if and only if the set \(\U \setdiff A\) is closed;
  2. The closure of a set \(B\) is the smallest (in terms of set containment) set \(C\) such that \(B \subseteq C\) and \(C\) is closed;
  3. An element \(a\) touches a set \(A\) if and only if \(a\) is an element of the closure of \(A\).
In general, this is kind of a pain, and getting it right requires being ruthlessly logical (says the guy who's definitely gotten it wrong by taking shortcuts!).  But I find the insight gained by characterizing touching to be psychologically helpful, even more so than thinking about the entire closure in aggregate.  Hopefully you'll see what I mean as we go along.

As a side note, I think it's useful to also keep in mind that a set \(A\) is open if and only if each element \(a\) of \(A\) does not touch the set complement (i.e., \(a \ntouches \U \setdiff A\)).   I don't find this helpful for deriving touching (at least doing so leads me more readily to error), but I do find it helpful for thinking about how open sets directly relate to touching.  Perhaps if I focused on the interior of a set and characterizing apartness (i.e. non-touching), this would be the better way to go.
 
For the following definitions of topologies on directed graphs, let \(\U\) be the set of all the nodes of a graph, \(a\) a single node, and \(A\) some (possibly empty) set of nodes. 


Definition 1a (Ahlborn's default/preferred).   A set of nodes \(A\) is open if and only if there is no edge from any node \(x \in \U \setdiff A\) to any node \(a \in  A\).  

Metaphorically, you can think of this an open set as a castle on lock-down that ensures no entry from the outside world!  Let's consider the open sets of The Fallen P Graph.  Since the graph has 4 nodes, there are \(2^4 = 16\) subsets of nodes, including all of the nodes (\(\U\)) and none of them (\(\emptyset\)).  Which of the 16 are open? Conveniently 16 is a small-ish number (unlike the number of real numbers between 0 and 1, for instance), so we can just check all of them.  Because I'm a pedant, I wrote down all 16 subsets on paper and started checking them, one by one, but if I eyeballed it, I might have been able to just write them down (though to be honest, I probably would have forgotten \(\emptyset\) even though it's always open).  Here are the 3 (count 'em, three) open sets:  \(\emptyset\), \(\{A\}\), and \(\{A,B,C,D\}\).  
Wow, that's...underwhelming.  Two of those sets (\(\U\) and \(\emptyset\) are always open.  So we get one singleton as a prize: you can confirm that no edges from outside of  \(\{A\}\) hit the only node in the set.  Personally I don't find that this enumeration of open sets gives me all that much insight.

So instead let's get back to my favourite lens on topology: what does Ahlborn's preferred topology imply for touching?  Well, let's follow the steps above:

Step 1: 

If \(A\) is open then \(U \setdiff A\) is closed.  How do we describe closed positively, rather than just "not open"?  Well, a set \(A\) is closed if every edge from a point in \(A\) arrives at some other point in \(A\). Interesting: my intuitions about "graph topology" quite like this definition of closed sets.  Of course intuitions are a fickle guide, but this proffers a feeling of comfort---moreso than the definition of open despite their being interchangeable as definitions of the topology.

Technically we have two proof obligations.  First we must prove that if \(A\) is open then \(\U \setdiff A\) has this property, which seems not bad since the latter set is everything else so where the heck else is an edge going to land?  Second, we must also prove that if \(A\) satisfies this property, then  \(U \setdiff A\) is open:  also not bad since we've ensured that \(A\) is everything else as far as \(U \setdiff A\) is concerned, and since every edge in \(A\) lands in \(A\), none of them land in \(U \setdiff A\) (isn't the law of excluded middle grand?).

Since The Fallen P has 3 open sets, it also has 3 closed sets: \(\emptyset\), \(\{B,C,D\}\), and  \(\{A,B,C,D\}\).  You can confirm that any edge that starts at a node in \(\{B,C,D\}\) ends in that same set.  Don't read too much into this example, though:  a more complex directed graph can produce some more interesting open/closed bifurcations of the set of all nodes \(\U\).

Step 2:

Here's where things get fun.  Suppose \(B\) is a set.  What is its closure, as in, what is the smallest set \(C\) such that every node in \(B\) is in \(C\) (that's the definition of set containment), and furthermore, every edge that starts in \(C\) ends in \(C\)?  I find it helpful here to  think "algorithmically" for a second:  First, add all the nodes from \(B\) into our potential \(C\).  Next, suppose we find every edge that starts in our potential \(C\) and ends outside of potential \(C\), and just add those new nodes to \(C\).  Cool, now all the nodes from \(B\) are in \(C\) and end in \(C\).  But what about these new nodes?  Some edges that start from new nodes may end in \(C\) but others may not.  So we should add new new nodes, based on following edges that take us outside of \(C\).  The same thing happens, so we need to add new new new nodes.  And so on.  Ideally you see where this is going:  the closure of \(B\) is the set of nodes that includes \(B\) itself and any node \(x\) that can be reached in any number of steps from a node in \(B\). Another way to put this latter part is to say that there is a path from some node in \(B\) to \(x\).  The whole thing can be succinctly summarized by saying \(x\) is reachable from \(B\).   

So once we have our closed \(C\) we can be sure that every node \(x \in \U \setdiff C\) is not reachable from \(C\), so there is no single edge from \(C\) into \(\U \setdiff C\), so the latter is open.  This all seems to be working out.

Returning to The Fallen P, we can give some examples of closure.  First the boring ones, the closure of each closed set, like \(\{B,C,D\}\) is itself:  we need not add any extra nodes to this set to make it closed.  Another way to put it is that the set itself is the smallest set that contains it and is closed.   As for non-boring ones, there are some interesting phenomena.  The closure of \(\{B,D\}\) is \(\{B,C,D\}\) because of the two closed sets that contain \(\{B,D\}\)---namely \(\{B,C,D\}\) and \(\{A,B,C,D\}\)---\(\{B,C,D\}\) is the smallest.  On the other hand, the closure of any set that contains \(A\) is \(\{A,B,C,D\}\) because the latter is the only closed set that contains the former.

Step 3:

The hard work has been done, and now we can state our definition by determining what it means for a node to be an element of the closure of a set of nodes.  That gives us a new definition

Definition 1b (Ahlborn's default/preferred).  \(a \touches A\) if and only if \(a \in A\) or there exists a path from some node in \(A\) to \(a\).

So in Ahlborn's topology, touching boils down to a node either being an element of a set or being reachable from the set.  Cool! Spelling this out was very helpful for me because it was not obvious to me that touching could partly be determined by nodes that are not themselves elements of \(A\).  Because of paths, a node could touch \(A\) not because of an edge from \(A\) to it, but rather because some other node not in \(A\) has an edge to it.  So it's useful to keep the idea of closure in mind when you think about touching, even though touching is the property that continuous functions preserve, not closure!  

This topology has some interesting features worth pondering. First, notice that \(a \touches \{a\}\) even if \(a\) has no edge from \(a\) to \(a\).  So touching has a fundamental "reflexivity" to it:  a graph with a self-loop has the same default topology as one that does not.  In particular, the following graph has the same Ahlborn-preferred topology as The Fallen P:


 
Second, since non-trivial touching is characterized by the mere existence of paths, having multiple paths to a node makes no difference in the topology imposed by the graph.  Furthermore, the touching relationship tells you nothing about how many edges one must traverse on the path from one node to another node, so there is no notion of "distance" either.   In particular, the following graph also has the same Ahlborn-preferred topology as The Fallen P:

In Fallen P, it takes "two hops" to get from A to C;  In this graph it only takes "one hop".  Topology don't give a hoot!

In fact, this particular topology can be determined from just the reflexive-transitive closure of edges.   
In short, the Ahlborn-preferred topology abstracts the reachability properties of a graph from the rest of the directed graph structure.  It's important to remember that the notion of "reachability" here always claims that a node is always reachable from itself, path or no path.

We've talked about open sets, closed sets, and closure in this topology.  I also find it useful to take a moment to consider two other common topological features:
  1. The interior of a set \(A\) is the largest (again by set containment) set \(B \subseteq A\) for which no edges start in \(\U \setdiff A\) and end in \(B\).  This looks a lot like the definition of an open set, and that's no coincidence because \(B\) is open.  So taking the interior is like pulling up the drawbridges and closing the gates:  you can hear the sound of battle outside the castle walls, being waged by those who were who are left on the outside  ("At the going down of the sun and in the morning, We will remember them").  So the interior of a set of nodes is those nodes that are not reachable from outside the set.  For example, in The Fallen P, the interior of \(\{A,B\}\) is \(\{A\}\).  
  2. The boundary of a set \(A\) is those elements of \(\U\) that touch both \(A\) and \(\U \setdiff A\).  In this topology, this is the set of nodes that are reachable from inside and outside the set.  For example, in the graph A --> B <-- C,  \(\{B\}\) is the boundary of both  \(\{A\}\)  and  \(\{C\}\). In the Fallen P,  \(\{B\}\) is the boundary of a number of subsets, including  \(\{A\}\).  Can you name all the other subsets of \(\U\) for which  \(\{B\}\) is the boundary?

Finally, a useful exercise for you would be to revisit the touching axioms and confirm that they all hold for each of these definitions.  In fact the "transitivity" case of touching helps explain why the definition of touching for Ahlborn's preferred topology is in terms of paths rather than edges, so this is well worth considering. 

Connectivity

Ahlborn relates topological connectivity in his default topology to graph structure, connecting it to weak connectivity.  A directed graph is weakly connected if every node is reachable from every other node so long as you ignore the directionality of the arrows.  In other words, a directed graph is weakly connected if its corresponding undirected graph is connected.  He relates other graph properties to topology as well!  This is comforting and supports the idea that the topology does preserve some interesting properties of the graph which we can then call "topological" properties.  But are these exactly all and only the topological properties of a directed graph?  Not so much!  That Ahlborn provides demonstrates other topologies says as much...

Two More Topologies

Here I will just give the touching-based definitions of Ahlborn's other two topologies.

Definition 2 (Ahlborn's \(\gamma\)).  \(a \touches A\) if and only if \(a \in A\) or there exists a path from \(a\) to some node in \(A\).

Definition 3 (Ahlborn's \(\delta\)).   This one's a tad peculiar.  First define an anti-edge from \(a\) to \(b\) to be the absence of such an edge in a graph.  Second, define an anti-path from \(a\) to \(b\) to be a transitive sequence of anti-edges starting at \(a\) and ending at \(b\).  Then:
\(a \touches A\) if and only if \(a \in A\) or there exists an anti-path from some node in \(A\) to \(a\).

It's useful to note that given a graph \(G\), the \(\gamma\) topology is the same as the default topology applied to the reversed graph \(G'\) that arises by reversing all of the edges of \(G\).  So there is a duality between the default and \(\gamma\) topologies, induced by how you treat the directedness of edges. 

Similarly, given a graph \(G\), the \(\delta\) topology is the same as the default topology applied to the shadow graph \(G*\) that arises by replacing all of the edges of \(G\) with all of the edges absent from \(G\).  This implies a different sort of duality between the default and \(\delta\) topologies, induced by the presence and absence of edges. 
 
Let's see how these new topologies play out for Fallen P, repeated here for convenience:



For \(\gamma\), we can define touching by introducing a touching relation for every edge, i.e.: \[ A \touches \{B\},  B \touches \{C\},  C \touches \{D\},  D \touches \{C\}, \] and then expand the touched sets every arbitrary way, and expand the touched nodes transitively!  Don't make me write out all the cases... In any case, this topology is the same as the default topology on "Reverse Fallen P":

For \(\delta\), we consider the anti-edges, but in a style analogous to the default topology, i.e.: \[ B \touches \{A\}, C \touches \{A\}, D \touches \{A\},  etc. \] and expand accordingly. This topology is the same as the default topology on "Shadow Fallen P":


So many topologies!  Which one is the right one?  I expect that the answer, as usual is "it depends":  on what questions about a graph you care about, and can be isolated in one of these topological structures.  I expect that there are other topologies that might be devised and applied to directed graphs.  For instance, why not have a topology assigned to the edges of a directed graph?  

So there are lots of "topological" properties of a directed graph, in the sense that they can be viewed through the lens of topology, but it's not clear to me what the limit of them are or could be, either theoretically or pragmatically.  

Examples/Exercises

Here's a few more simple graphs for you to ponder and enumerate open sets, closed sets, touching, boundaries, and what have you.  The parentheticals include graph properties defined by Ahlborn but not explained here: consult Ahlborn's tech report for details.

Example 1: A --> B

This graph has two nodes and one edge.  We can write out the cases of touching for the different topologies, omitting the cases that are true simply because touching extends set membership: 



Example 2: A <-- B --> C  (weakly connected but not unilaterally)

Example 3: A    B --> C (disconnected) 


Directed Graphs of Topologies?

It's worth noting that we can ask the reverse question:  do topologies correspond to directed graphs?  Well, in general no, but I think that any arbitrary touching relation can be seen as yielding a special kind of hyper-graph, where one side of every edge is a singleton set (you'll have to decide the direction, it's the difference between the default and \(\gamma\) topologies)!  I would not be surprised if this perspective, generalized to arbitrary hypergraphs, might be related to proximity and near sets
A topic to explore for another day.

Acknowledgments

I have to thank Michael Betancourt for inspiring these investigations and this post, and providing wonderful feedback and suggestions.  His efforts to explain some essential aspects of topology to statistical modeling students sparked a wide-ranging conversation that led me to ponder directed graphs (among other things).

History

May 13, 2023 - First Posted
May 16, 2023 - GraphViz'd the graphs
June 7, 2023 - Incorporated a running example, fixed broken definition of \(\delta\) topology.