Theoretical Physics could do with borrowing some ideals from AlexNet and Sutton's Bitter Lesson

Moving Beyond "Beauty", "Elegance" and suchlike Abstractional Biases


Sabine Hossenfelder is perhaps one of the most interesting thinkers I know of. Her consistency and regularity on YouTube is particularly admirable; every week there are two new videos that are in no way just reaction videos or "here's what happened, and here's what I think about it" - they are all well-researched and are built up around a strong core thesis.

One of her main ideas is that post-relativity theoretical physics has bet much (perhaps too much) on Symmetry, Elegance, and suchlike handwavy, loosely-definable abstractions, that all boil down to elegant patterns.

What is especially striking and remarkable is that in fundamental physics a beautiful or elegant theory is more likely to be right than a theory that is inelegant."
- Murray Gell-Mann

This isn't a very disagreeable statement, tracing a very basic Occam's razor-like argument where simplicity is a given and everything else emerges from basic principles isn't completely invalid.

Of course, results like Noether's theorem or Least Action Principle enforce this very lossy heuristic in people.

Here's an interesting slide from one of Hossenfelder's presentations on suchlike topics:

"Physicists Rely on Beauty, Physicists think that the foundations of physics are not pretty enough, They invent prettier theories and are then surprised if no evidence is found to support them, They are largely unaware that this is what they are doing because requirements of 'beauty' have become mathematical standards"

This is, prima facie, true: Supersymmetry, string theory, gravitons, the fifth fundamental force, etc. are all such attempts conforming to this comment.

There are obvious instances of observer/anthropic bias playing here: x force *operates* at x dist, x particles *combine* to y, etc. - even such small declarations are fairly liberal assumptions (assuming hierarchies, conditional ops etc.).

These are mostly right but should be carefully invoked, just the invocation of this is strongly tied to a very skewed, opinionated and non-objective understanding of reality.

Appendix: String theory in itself is a very observer-biased view of things. "Excitations on a string" is a strong metaphor that uses strings, something we *can* observe, instead of something that we can't. Fair enough! Sapir-Whorf-ish territory here.

Now clearly this isn't the right way ahead but there are good reasons why such thinking exists or is even reinforced: cognitive biases mean we are looking for simplicity and the human mind, beyond a certain point, simply cannot build a theory that holds high explanatory power, is consistent, just within one's head or even with a group.

"Just as there are odors that dogs can smell and we cannot, as well as sounds that dogs can hear and we cannot, so too there are wavelengths of light we cannot see and flavors we cannot taste. Why then, given our brains wired the way they are, does the remark 'Perhaps there are thoughts we cannot think' surprise you?"
– Richard Hamming

Just as there is computational irreducibility (i.e., phenomena being too convoluted and pattern-less to be computed), there is obvious "cognitive irreducibility" (i.e., phenomena too convoluted for a human to model them well enough).

Wait... I've seen this somewhere else... Such a counter-productive obsession with elegance and beauty existed in AI for a long time (If you ask me to point to when it completely paused, it's perhaps AlexNet in 2012).

Of course there are obvious major distinctions between computer vision (that AlexNet "solved") and physics: absence of noise, hierarchies, interp. etc.

But, just like pre-AlexNet, we are still assuming general patterns (see unreasonable effectiveness of math), discounting emergence, etc.

It's a different question if god plays dice or not, but even if there was a nice "intelligent design", chaos, emergence and complexity will spawn regardless. These are theoretically guaranteed when you have sufficiently stochastic processes carried out by enough entities. Outlier cases will exist. Long tails will exist. No formulation of causal processes can sufficiently model everything.

I want to go back and make a very short and basic case for why deep learning is so unreasonably effective in the first place: My general thesis (which is experimentally backed by historical research) is this:

Many high-dimensional data sets that occur in the real world actually lie along low-dimensional latent manifolds inside that high-dimensional space.
=> many data sets that appear to initially require many variables to describe, can actually be described by a comparatively small number of variables

Neural nets have enough combinatorial expressivity and a great learning optimizer (backprop + SGD) that allows to take in massively wide datasets, and to weightedly best attribute causal phenomena to a few "latent manifolds". This is exactly what we want (not from an interpretability view, but from a utility one): to cut through the noise, and find tractable patterns that best model phenomena.

This is, in my opinion, best illustrated by the "lottery ticket hypothesis" i.e., massive, overparametrized neural nets lets backprop+SGD do their magic to find these low-dim, latent manifolds via sheer brute force.

The core lesson here was simple: It's the 2nd part of Sutton's Bitter Lesson. The first one gets a lot of attention, but the 2nd one is very aptly applicable here:

"The general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done."

Sutton makes 2 broad points:
1. Do not wishfully bank on symmetry and simplicity to come and save you by them magically coming in and offering explanations.
2. Keep your longings and biases–which are incredibly skewed by virtue of anthropomorphization and the observer bottleneck–away. This is in vain.

AlexNet did exactly this, when almost everyone was going down the exact pitfalls Sutton warns against.

Coming back to physics, there is interesting work in this vein:

AlphaFold, in my opinion, best captures my main thesis in this post: It understood molecular dynamics and interaction patterns within proteins, not with intentfully baked-in biases, but by the sheer power of deep learning.

A team at DeepMind used neural nets to simulate quantum mechanical phenomena.

Miles Cranmer had a great lecture titled "The Next Great Scientific Theory is Hiding Inside a Neural Network" which went over using neural nets to predict physical behaviour and later symbolicising the principles the neural net has learnt.

I am particularly interested about the latter part.

I would have loved to have ended this piece with a "don't be ignorant to either of these viewpoints, find a suitable middle ground", because I really doubt there is one, at least one in sight now.