WARNING: the following contains a whole lot of pedantry about proving theorems at a level of detail such that you could likely convince a computer proof assistant of their correctness. I teach a course where students learn to write such proofs without computer assistance, because doing so can profoundly affect how one perceives the possibilities for formal definition and proof. 1 Sometimes the default reasoning principles provided by a proof assistant can restrict one’s perspective of what reasoning principles are acceptable, how and why they might be judged acceptable, and how to devise and validate new ones.
More relevant to the present post is that even the literature and pedagogy of writing paper proofs is rife with defaults. I spend a lot of time teaching about proof by induction, and I find that my students, most of whom have learned something about proving stuff in the past, often arrive constrained by a very regimented approach to induction, involving mandatory “base cases” and “induction cases” and these nebulous “induction hypotheses” which seem to descend from the sky when invoked by arcane ceremony. I have some strong feelings about this, mostly due to my own struggles with gaining comfort with induction as a student, so I try to help students evade those struggles.
Taken at face value, this post is about some other proof principle, proof by infinite descent, but at its heart it’s really about induction. Ideally by the end of this post, I will have challenged, but also shed light on, the nature of inductive proof cases and induction hypotheses. As a bonus, you might see how induction can be used to explain another proof principle that intuitively seems valid, but for which it may not be obvious how to justify it formally.
Two Proofs That \(\sqrt{2}\) is Not Rational
In an excellent blog post about the difference between proof of negation and proof by contradiction, Andrej Bauer gives as an example a proof that \(\sqrt{2}\) is not a rational number. The article is worth a read, but I would note in particular that in a technical sense, his proof is not a proof that it is an irrational number, just that it is not a rational number. I imagine that one could also prove that \(\sqrt{-1}\) is not rational, but that had better not count as a proof that it is irrational! However, if we also prove that \(\sqrt{2}\) is a real number, which could be done using Dedekind cuts for instance, then we can deduce that \(\sqrt{2}\) is indeed irrational. And to no one’s surprise, we can prove that \(\sqrt{-1}\) is not a real number.
Theorem:\(\sqrt{2}\) is not rational.
Proof. Suppose \(\sqrt{2}\) were equal to a fraction \(\frac{a}{b}\) with \(a\) and \(b\) relatively prime. Then we would get \(a^2 = 2b^2\), hence \(a^2\) is even and so is \(a\). Write \(a = 2c\) and plug it back in to get \(2c^2 = b^2\), from which we conclude that \(b\) is even as well. This is a contradiction since \(a\) and \(b\) were assumed to be relatively prime. QED.
This is a proof of the negation of some proposition, i.e., that assuming the truth of the proposition implies a contradiction. Let’s consider some of its details.
First, the proof assumes that \(\sqrt{2}\) is equal to a fraction \(\frac{a}{b}\), which implicitly implies that \(a\) and \(b\) are integers and that \(b\) is not 0. We could limit the two numbers to be naturals, by which I mean non-negative integers, applying the convention that \(\sqrt{2}\) denotes specifically the positive root, but since there exists a negative number that when squared equals 2, and since generalizing the claim plays into my overall story, let’s make no commitment to whether \(a\) and \(b\) are positive or negative.
Next, the proof uses the fact that every rational number can be represented by a fraction \(\frac{a}{b}\) such that \(a\) and \(b\) are relatively prime, in other words that no positive integer other than 1 divides them evenly, which is a precise way of saying that \(\frac{a}{b}\) is in reduced form. This is a fact that needs to be proved separately and prior to this proof. We could instead ntegrate this reasoning directly into the proof: omit the relative primality assumption, and then let \(a' = a/\gcd(a,b)\) and \(b' = b/\gcd(a,b)\) where \(gcd(a,b)\) is the greatest common divisor of both. It then follows from some proof effort that \(a'\) and \(b'\) are relatively prime and \(\frac{a'}{b'} = \frac{a}{b}\), thus completing the general proof. However, we’re still using an auxiliary fact that can be proven by induction: that \(a\) and \(b\) have a unique greatest common divisor. More on that below.
From these assumptions and facts, the proof deduces that both \(a\) and \(b\) are even, which means that \(2\) divides both of them, which contradicts the assumption that among positive integers only 1 can do this. So assuming that \(\sqrt{2}\) is rational leads to contradiction.
Now to drive the topic of this post, here’s a more annoying proof of the same fact:
Proof. Suppose \(\sqrt{2}\) were equal to a fraction \(\frac{a}{b}\). Then \(a\) can’t be \(0\) because \(\frac{0}{b}^2 \neq 2\). Then we would get \(a^2 = 2b^2\), hence \(a^2\) is even and so is \(a\). Write \(a = 2c\) and plug it back in to get \(2c^2 = b^2\), from which we conclude that \(b\) is even as well. Write \(b = 2d\) and plug it back in to get \(c^2 = 2d^2\). This yields a contradiction because we could divide our integers by \(2\) forever, which is impossible. QED.
On the upside, this proof gets to omit the assumption that \(a\) and \(b\) are relatively prime, which in turn means we don’t need to consult Wikipedia to remind ourselves what the heck “relatively prime” means anyway. But on the other hand...what just happened?!? Intuitively this makes sense, but is this legit?
While Andrej’s proof is a direct proof of negation, this annoying one is an example of proof by infinite descent, which according to Wikipedia is a particular kind of proof by contradiction, and relies on something called the the well-ordering principle. By the end of this post, I hope to convince you that: a) it is not a kind of proof by contradiction: rather it’s a technique for proving negation, so this claim deserves some finger-wagging from Andrej. b) it does not rely on the well-ordering principle, and good thing that because if it did, then the annoying proof would be broken, but it’s not.
We should be able to ascertain from a formal statement of the principle that it is simply a proof of negation. Andrej’s post elaborates on the source of confusion. And what is the problem with the claim that proof by infinite descent relies on well-ordering, whatever that is? To explain, let’s first pinpoint a more fundamental and general property, called well-foundedness, and reasoning principle, well-founded induction, that the principle of infinite descent can be based upon and even generalized, and use that to explain the confusion. Along the way, we’ll produce a different explanation of proof by infinite descent in terms of induction: an explanation that suits my induction-happy brain better. Also, we’ll get the opportunity to investigate induction itself: in particular, I want to demonstrate that the common “base case/induction case” recipe for induction is a useful but limiting view of how induction can be applied.
Well-foundedness and Well-ordering
As you might guess from the above two proofs, they depend on some notion of “getting smaller”, and in the case of natural numbers the default notion is the less-than relation \(<\), but less-than-ness is just an example of a more general phenomenon. So first, let’s isolate the property of \(<\) on natural numbers that really powers induction. A binary relation \(\sqsubset\) is called well-founded if all leftward chains of the form \(\dots \sqsubset x_3 \sqsubset x_2 \sqsubset x_1\) must be finite in length, so there cannot exist any infinite descending chains. The \(<\) relation on natural numbers has this property, because starting from any \(n\) and producing smaller numbers and stopping whenever you want, you will always have to stop, either necessarily at \(0\) or earlier by choice. On the other hand, \(<\) extended to integers does not have this property because you can go forever down the negative numbers, e.g., \(\dots -16 < -8 < -4 < -2\). Luckily, \(<\) isn’t the only binary relation on integers! Here’s a different one that is well-founded: if \(z_1,z_2 \in \mathbb{Z}\), say \(z_1 \mathbin{\mathbin{\lvert<\rvert}}z_2\) if and only if \(\lvert z_1\rvert < \lvert z_2\rvert\), where \(\lvert z\rvert\) denotes the absolute value of \(z\) and \(<\) is restricted to natural numbers. Then \(2 \mathbin{\mathbin{\lvert<\rvert}}3\) but also \(-2 \mathbin{\mathbin{\lvert<\rvert}}3\), \(2 \mathbin{\mathbin{\lvert<\rvert}}{-3}\), and \(-2 \mathbin{\mathbin{\lvert<\rvert}}-3\): the sign of the integer is ignored! Notice, though, that \(2\) and \(-2\) are not related by \(\mathbin{\mathbin{\lvert<\rvert}}\) in either direction. Since every leftward chain of the \(\mathbin{\mathbin{\lvert<\rvert}}\) relation works its way toward \(0\), the center of the number line, we can see that all such chains must be finite.
Since “well-ordering” and “well-founded” both start with “well-”, you might suspect that there’s a relationship between them, and you’d be right. Well-orderedness is described a variety of ways in the literature, but for our purposes the following is the most relevant formulation: a well-founded relation \(\sqsubset\) is a well-ordering if and only if it relates every distinct pair of elements, i.e. \(x_1 \neq x_2\) implies either \(x_1 \sqsubset x_2\) or \(x_2 \sqsubset x_1\). So based on what we’ve observed, \(<\) is well-ordered, but \(\mathbin{\mathbin{\lvert<\rvert}}\) is not (remember \(2\) and \(-2\)?).
To tie up a loose terminological end, consider the “-ordering” and “-founded” suffixes. Well-orderings must be transitive: if \(x_1 \sqsubset x_2\) and \(x_2 \sqsubset x_3\) then \(x_1 \sqsubset x_3\). The consequent must hold because if it didn’t, then totality would force \(x_3 \sqsubset x_1\), but that would introduce a three-element cycle \(\dots \sqsubset x_3 \sqsubset x_1 \sqsubset x_2 \sqsubset x_3\) and break well-foundedness. Then, given some well-ordering \(\sqsubset\), extending it to the relation \(\sqsubseteq\), which extends \(\sqsubset\) to be reflexive, is a total order, i.e., a partial order (reflexive, transitive, antisymmetric relation) that is also total. On the other hand, a well-founded relation does not have to be transitive, let alone total, so extending one in this manner is not even guaranteed to get you a partial order, let alone a total one. Hence the suffix “-founded”.
Proof by Infinite Descent, Generalized
Now that we have some grasp of well-foundedness, and why well-orderedness is a red herring, Let’s use it to state the Principle of Proof by Infinite Descent precisely and generally.
Theorem: Principle of Infinite Descent
Let \(X\) be a set and \(\sqsubset\) a well-founded relation on \(X\). Furthermore let \(\Phi(x)\) be some property of elements \(x\) of \(X\). Then if for every \(x\) in \(X\), \(\Phi(x)\) implies \(\Phi(y)\) for some \(y \sqsubset x\), it follows that \(\Phi(x)\) does not hold for any \(x\) in \(X\).Stated in formal logical notation: \[(\forall x\in X.\, \Phi(x) \implies \exists y \in X.\, y \sqsubset x \land \Phi(y)) \implies \forall x\in X.\, \neg\Phi(x).\]
The premise of the principle formalizes the idea that “no matter where you are (which x), you can still make a ’descending’ step.”, and the conclusion is that “well, if so then your property can’t apply to any x: it leads to contradiction.” Arguably a better name for the principle would be Proof by No Infinite Descent (thanks to Andrej for this observation!).
Our proof that \(\sqrt{2}\) is not rational implicitly applies this principle: Let \(X\) be the integers \(\mathbb{Z}\); for integers \(a,b\), let \(a \sqsubset b\) be \(a \mathbin{\mathbin{\lvert<\rvert}}b\); and let the property \(\Phi(a)\) be “\(a^2 = 2b^2\) for some integer \(b\)” (formally, \(\exists b\in \mathbb{Z}.\,a^2 = 2b^2\)). Assuming \(a \neq 0\) and taking \(b\), the proof uses division by \(2\) to produce a \(c \mathbin{\mathbin{\lvert<\rvert}}a\) for which some \(d\) satisfies the equation. Having satisfied the premises of the principle, we use it to deduce that no \(a\) satisfies the property (though admittedly the prose proof shrouds this use of the principle in squishy “forever is impossible” terminology).
The intuition for the principle is pretty straightforward: if the premises hold, then in principle we can build an infinite descending \(\mathbin{\mathbin{\lvert<\rvert}}\)-sequence of integers that satisfy the property. But there are no such infinite sequences, because we assumed that all of them must be finite, so a contradiction must follow.
But let’s buttress this intuition with rigour. We can prove this principle as a consequence of:
The Principle of Well-founded Induction: Let \(X\) be a set and \(\sqsubset\) a well-founded relation on \(X\). Then let \(\Psi(x)\) be some property of elements \(x\) of \(X\). Then if for every \(x\) in \(X\), \(\Psi(y)\) holding for every \(y \sqsubset x\) suffices to prove \(\Psi(x)\), it follows that \(\Psi(x)\) holds for every \(x\) in \(X\).
Stated in formal logical notation: \[(\forall x\in X.\, (\forall y \in X.\, y \sqsubset x \implies \Psi(y)) \implies \Psi(x)) \implies \forall x\in X.\, \Psi(x).\]
This principle is quite abstract, since it’s stated in terms of a general principle. To clarify, let’s instantiate it with particular well-founded relations that lead to principles with which we are familiar. We already observed that \(<\) is a well-founded relation on natural numbers (because all well-orderings are also well-founded), we can specialize well-founded induction to get what is often called strong mathematical induction or course-of-values induction on natural numbers: \[(\forall n\in \mathbb{N}.\, ((\forall m \in \mathbb{N}.\, m < n \implies \Psi(m)) \implies \Psi(n)) \implies \forall n\in \mathbb{N}.\, \Psi(n).\]
The general structure of a proof by course-of-values induction is to assume some natural number \(n\), then prove that assuming that \(\Psi(m)\) holds for every number less than \(n\) suffices to prove \(\Psi(n)\). Achieving that suffices to prove that \(\Psi\) holds for every natural number \(n\). Observe that the assumption “\(\Psi(m)\) holds for every number less than \(n\)” is what gets called an “induction hypothesis”. In the common style of prose proofs, the “induction hypothesis” often gets evoked in the middle of an argument, and for a specific value, rather than explicitly assumed at the beginning of the argument in full generality. To me this always seemed to me to be some magical thing that got pulled out of the air in the middle of the proof, leaving me unsure and about what kind of thing induction even was. But in fact the induction hypothesis is just an assumption that you make as you attempt to prove \(\Psi(n)\), and its justification is hidden in the proof of the principle of well-founded induction. Surprise: you can prove induction principles correct, and devise and prove correct new ones: they need not be taken as axioms!
Though students have often seen strong mathematical induction, they are more familiar with plain ole’ mathematical induction, where the “induction hypothesis” is to assume that a property holds for a natural number one less than the current one. We can get this by using the well-founded relation \(n_1 <_1 n_2\) which holds if and only if \(n_2 = n_1 + 1\). This relation is neither total nor an order since it only ever relates abutting natural numbers. But, curiously enough, we can still plug it into the Principle of Proof by Infinite Descent if we want, and get something meaningful, albeit more restrictive, than using \(<\).
Advertisement: I first learned about this perspective on induction from Glynn Winskel’s excellent textbook, The Formal Semantics of Programming Languages. I highly recommend it.
What even is a “base case”?
A common structure used in proofs that proceed by either mathematical induction or course-of-values induction, or at least in teaching such proofs, is to assume \(n\), assume the induction hypothesis, and then proceed by cases on whether \(n = 0\) or whether \(n > 0\). The first case gets called the “base case” and is proven outright, and the second case gets called the “inductive case” which exploits “the induction hypothesis” to complete that case. When using induction more generally, as is common in PL theory, this notion of base cases and induction cases gets emphasized as well. However, it’s worth looking back at the principle of well-founded induction and observing that nothing about that theorem forces a particular case analysis, or restricts the availability of the premise that gets called “the induction hypothesis”. The funny thing is that the “base case” also has a perfectly fine induction hypothesis, but typically it is useless because most of the time the proof in question is not equipped to conjure an \(m < n\) with which to exploit it.
This template for applying the proof by course-of-values induction is so common that often the split into base case vs. inductive case, where the base case does not appeal to induction, gets baked right into the explanation of the proof principle, as if it were part of the approach. I find this categorical specialization to be unfortunate for two reasons.
The first reason is that sometimes a proof needs a case split, but this particular case split is the wrong one: in that it does not help push the proof through. My favourite demonstration of this phenomenon is a proof that any two positive natural numbers have a greatest common divisor. This proof could be extended to integers, and thus apply to Andrej’s proof as discussed above, but doing so would deviate more from my main point than I would like. The proof involves a helper lemma followed by a main theorem.
Lemma. For all positive natural numbers \(a\) and for all positive natural numbers \(b\), if \(a \geq b\) then there exists a positive natural number \(c\) such that \(xc = a\) for some positive natural \(x\) and \(yc = b\) for some positive natural \(y\), and furthermore if \(d\) is a positive natural number that satisfies these properties, then \(c \geq d\).
This lemma can be proven by course-of-values induction. Then the statement of the main theorem has the same structure, but does not assume \(a \geq b\). It can be proven by splitting cases on whether \(a \geq b\) or \(b \geq a\) and then applying the lemma with the two numbers appropriately ordered (either order will work if the numbers are equal).
I’m only going to give a sketch of the lemma’s proof, since it suffices to make the relevant point. It proceeds by course-of-values induction on \(a\) where the relevant property of a is the substatement : “for all positive natural numbers \(b\) ...” (now you know why I explicitly separated the two variable quantifications in the prose statement of the lemma). After assuming the induction hypothesis, assuming \(b\) and assuming \(a \geq b\), we proceed by cases on whether \(a = b\) or \(a > b\). If \(a = b\) then we can show that the greatest common divisor is the number itself, so \(a\). If \(a > b\), then we can show that the greatest common divisor of \(a-b\) and \(b\) is also the greatest common divisor of \(a\) and \(b\). However, to appropriately apply the induction hypothesis, we must first determine which of \(a-b\) and \(b\) is larger: since \(b\) is positive, we know that \(a-b < a\), and we previously assumed \(b < a\), so either can play the role of “smaller \(a\)”, but to make sufficient progress with the induction hypothesis, we need “next \(b\)” to be less than “smaller \(a\)”, so we split cases on whether \(a-b < a\) or not, arbitrarily letting the alternate case also handle \(a-b = a\).
The interesting part of the above proof is that two cases are considered, and only one of them appeals to the induction hypothesis. But the case that does not is not an \(a=0\) case as per the default template. In fact \(a = 0\) could never apply since we assumed that \(a\) is positive.
For exactness sake, it’s worth pointing out that one can perfectly well devise a principle of course-of-values induction that applies to just the positive integers, such that \(a = 1\) would be viewed as the “base case”. If we were to separate out that case, we could deduce that \(b = 1\) because it must be a positive integer, and it must be less than or equal \(a\). So in practice, the \(a = 1\) case can get handled using the exact same reasoning as the other \(a = b\) cases, which happen to circumscribe those situations where no appeal to an “induction hypothesis” is required.
So the lesson to be drawn from my first gripe is that when performing induction, if we reconceive of “base cases” to mean those cases that do not appeal to an induction hypothesis, then such cases need not coincide with the cases that involve the minimal elements of the well-founded relation (or the unique minimum element of a well-ordered relation). For many proofs the default heuristic helps, but that default is not a universal law of induction.
No base case? no problem!
To reiterate, my first problem with how course-of-values induction is taught is that the default base-case/inductive-case split may not be the most fruitful one. The second problem I have is that you may not even need such a split. Usually it is assumed that there must be some “base case”, that does not get to exploit the induction hypothesis, for how could it ever apply? But in some proofs you can always exploit the induction hypothesis. This is exactly the case in the following proof of the Principle of Infinite Descent that uses the Principle of Well-founded Induction.
Theorem: Principle of Infinite Descent \[(\forall x\in X.\, \Phi(x) \implies \exists y \in X.\, y \sqsubset x \land \Phi(y)) \implies \forall x\in X.\, \neg\Phi(x).\]
Proof. Suppose \((\forall x\in X.\, \Phi(x) \implies \exists y \in X.\, y \sqsubset x \land \Phi(y))\). Now we prove \(\forall x\in X.\, \neg\Phi(x)\) by well-founded induction on \(x\in X\). We will use as our predicate \(\Psi(x) := \neg\Phi(x)\), which is synonymous with \(\Phi(x) \implies \bot\), i.e. \(\Phi(x)\) implies contradiction. Suppose \(x \in X\), and suppose \(((\forall y \in X.\, y \sqsubset x \implies \neg\Phi(y))\) At this point it suffices to show \(\neg\Phi(x)\), for which it suffices to assume \(\Phi(x)\) and deduce a contradiction. So suppose \(\Phi(x)\). Now, applying our first assumption to this particular \(x\), we learn \(\Phi(x) \implies \exists y \in X.\, y \sqsubset x \land \Phi(y))\), and combining this with our most recent assumption, we learn that \(\exists y \in X.\, y \sqsubset x \land \Phi(y)\), so consider that \(y \in X\). Since \(y \sqsubset x\), we can use the induction hypothesis to prove \(\neg\Phi(y)\). But we also have \(\Phi(y)\), to which we can apply \(\neg\Phi(y)\) to deduce a contradiction. QED.
This proof of proof by infinite descent used course of values induction, but did not need to consider distinct cases related to \(x \in X\). There is no “base case” here, in the sense that no part of the proof must be separately completed without the help of an induction hypothesis. Now, when specialized to natural numbers, one could do so, breaking out the \(0\) case and arguing directly that there is no smaller number, without reference at all to \(\Phi(x)\). Or more nebulously, we can split on “minimal elements” versus “non-minimal elements”, but to what end?
Part way back around, without proof by contradiction
The following is a bonus: thanks to Andrej who made the observation that inspired it.
We’ve proven that if \(\sqsubset\) is well-founded, then principle of infinite descent applies. What about the reverse? The answer is “it depends.” In particular, if you prefer to work without proof by contradiction (i.e. constructively), then you can only get part way there. But with some use of proof by contradiction, you can get all the way there. In particular, we can prove that for an arbitrary binary relation \(R\) on a set \(X\), if it satisfies the statement of the principle of infinite descent, then it has no infinite descending chains. Wait, isn’t that enough? Well, no, not in a world where you want to work constructively, without proof by contradiction. I sneakily defined well-foundedness as the assertion that all chains are finite, which is a stronger statement than the assertion that there can be no infinite descending chains, so long as you avoid proof by contradiction. The positive conception is nice because it works in both contexts, whereas “no infinite descending chains”, which is often taken as the definition of well-foundedness in classical logic, does not.
Ok on to the proof. First, let’s get precise about descending chains. Define a descending chain of a binary relation \(R\) to be a partial function \(f : \mathbb{N} \rightharpoonup X\) such that for all natural numbers \(n\), if \(f(n+1)\) is defined, then so is \(f(n)\), and furthermore \(f(n+1) \mathrel{R} f(n)\). Then an infinite descending chain is a descending chain that is defined for every natural number \(n\).
Now, we can prove that if \(R\) is an arbitrary binary relation (rather than a well-founded onw) that nonetheless satisfies the letter of the principle of infinite descent, then \(R\) has no infinite descending chains.
Proof. Suppose \(R \subseteq X\times X\), and suppose that for any property \(\Phi(x)\): \[(\forall x\in X.\, \Phi(x) \implies \exists y \in X.\, y \mathrel{R} x \land \Phi(y)) \implies \forall x\in X.\, \neg\Phi(x).\] Then it suffices to assume \(f : \mathbb{N} \to X\) is an infinite descending chain and then deduce a contradiction. To do so, it further suffices to let \(\Phi(x) := \exists n\in\mathbb{N}.f(n)=x\) and prove the premise of the proof-by-infinite-descent-shaped assumption, because from that we can deduce that \(\forall x\in X.\neg\exists n\in N.\,f(n)=x\), (i.e. \(f\) is totally undefined, the empty partial function) which combined with \(x = f(0)\) and \(n = 0\) suffices to deduce a contradiction.
Let’s prove the sufficient condition: suppose \(x \in X\) and \(\Phi(x)\), i.e. that \(f(n) = x\) for some \(n \in \mathbb{N}\). Then \(\Phi(f(n+1)\) holds, and by assumption \(f(n+1) \mathrel{R} f(n)\), which can be combined to prove the existential conclusion. QED.
Coda
According to John Stillwell’s Elements of Algebra (Section 2.10), Blaise Pascal was “the first to use induction in the ’base case, induction step’ format” in a paper from 1654. But our modern understanding of induction has it’s origins in Richard Dedekind’s seminal 1888 book “Was sind und was sollen die Zahlen?” (sometimes translated “What are the numbers and what are they for?”). Many defaults—be they notation, techniques, and explanations—in mathematics have historical origins that precede a more sophisticated understanding of phenomena, and sometimes an impedance mismatch between the historical and the modern can limit our understanding (treatment of variables and functions by mathematical texts is one of my standard whipping horses, but that’s a topic for another day). Those perspectives persist because they were useful, and are still useful today, but it also helps to consider how they might best be incorporated with those refined conceptions we’ve gained since their inception.
As for the use of proof by infinite descent versus a proof like Andrej’s that argues directly from minimality, one could reasonably argue that explicitly calling out the existence of a minimal case is more intuitively graspable for some arguments. It’s not clear to me whether arguing from the perspective of “there must be one or more minimal cases” versus “you can’t go down this way forever” is more generally clear, especially once you have come to understand proof by infinite descent. Furthermore, in the case of general well-foundedness, there may not be a single smallest case, but rather a (possibly) infinite number of minimal cases, in which case calling all of them out abstractly may be less compelling than calling out a singular minimum as was the case (pun intended) for the natural numbers. Which argument is most enlightening probably depends on personal taste.
Acknowledgments
Many thanks to Andrej Bauer for feedback on this post! Thanks to folks on Hacker News who noted that the annoying proof did not address \(a=0\).