Joseph Tooby-Smith

Physics, AI and Lean

2026-02-27

Recently there has been a dramatic increase in the number of people using AI to do physics in Lean. This ranges from seasoned researchers to weekend hobbyists.

The first thing I want to say is that it is genuinely impressive how far AI has come and what it can now do in the Lean sphere.

However, I also feel it is important to get across to the physicists and others who might have a cursory understanding of Lean, that many of these AI projects are producing what I would consider as “bad code”.

What does this mean? In the same way that there are two parts to writing a physics paper, doing the calculation and then finding the best way to articulate that calculation in a paper, there are two parts to writing Lean code.

The first part is getting to “merely true” (a term coined by Kim Morrison I believe). That is proofing the theorem you want to by whatever means possible. This is like doing the calculation.

The second part is making sure that the Lean code produced is reusable, maintainable, confirms to library standards and were possible is integrated into existing libraries. This is akin to making sure the calculation is well articulated. For ease of terminology, I am going to call this the “polishing part” although, the word polishing does not convey the magnitude of what it does to the code.

Within Lean both these parts are extremely important, and it is worth adding that it is often the polishing part which is the most work and requires the most skill and expertise.

In my opinion, and I believe the opinion of many experts of Lean, code which gets to merely true but which does not do the polishing part is bad code. To go back to my analogy, it is like writing a calculation on a pen-and-paper and submitting that to the arXiv with the hope that the reader will understand the scribbles. To be completely objective, earlier this year I wrote on my blog some standards of what makes good Lean code, code which is not polished often falls fowl of multiple of the points there.

So how does this tie in with AI written code… as you can probably guess it is often the case that Lean libraries written with AI are not polished. In fact, I am yet to see one in the realm of physics. in many cases these libraries don’t even get to merely-true as they introduce axioms along the way (the use of axioms in any Lean code is IMO a no-go, but that is a story for another day).

So why should a physicist care about any of this? To me there is one killer reason, which every physicist should care about, even if they are not interested in helping the Lean ecosystem. This is readability. Lean does not check for many things, it does not check that definitions are correct, or that the theorem it is proving are the ones you want it to prove, it checks mathematical correctness only. The problem is that code which is not polished is often unreadable by anyone except the authors (go back to the example of a hand-written calculation), and thus one cannot verify or trust the correctness of these other aspects of the code.

This of course assumes that the (human) authors of the formalizaton themselves understand it mathematically and can read the code. I think it is an even worse situation if they cannot, as then there is no trusted verification (at all) that the definitions are correct, or the right theorems are being proved. This leads to the questions whose job is it to provide such verification? To me, the answer should always be the human authors. Not reviewers. Not random members of the community.

Another reason physicists should care about polishing is reusability. Namely, how easy is it for someone to take your code and reuse what you have done in your own proofs. Code which is only ‘merely-true’ usually only fits the single purpose it was written for and is often not reusable for other tasks. The problem here is that someone else must come along and essentially rewrite the code, often putting a lot more thought and care into it. My main issue here is that the person who does this, is likely to get essentially zero credit in the physics community, even if they have done the majority of the work (something I think we must change).

The third and final reason physicists should care is reliability. This means how easy the code is to maintain long-term, how easy it is to run on small computers etc. Lean code written with an AI tends to be very long, which corresponds to very slow. This makes it hard to maintain and keep up to date with changes in the library and wider echo-system.

At this point, the reader is probably thinking to themselves, well if we can use AI to get the “merely-true” then we can use it to do the “polishing”. To answer this, I want to adapt the title of Fred Schwed Jr.’s famous economics book “Where are the customers’ yachts?”, and ask “Where are Mathlib’s PRs?”. If the task of “polishing” is easy to do with AI, why is no one doing it, and why is Mathlib not inundated with high-quality PRs from AI.

A future without physics papers?

2026-02-11

A future without Physics papers?

One could envision a future where we no longer have a need for physics papers. With results from physics being written directly into the interactive theorem prover Lean and projects like PhysLean, with the guarantee of mathematical correctness that is thereby given.

But one should be careful. To echo a message I recently posted on the Lean Zulip: Lean does not verify the physics context of results, it does not verify that “this is main-stream physics”, that “quantum mechanics corresponds to the real world”, or that “this is interest to the community”, it only verifies the correctness of mathematics, and that really is it.

The reason we have journals is to verify all these things collectively, and we do that via the concept of proof by authority or verification by trusted source. The trusted source here being the journal and the peer-review process, who say “yes this is correct both mathematically and physically, and it is of interest to the community”.

In a future without physics papers and a focus on building PhysLean and other Lean libraries, we would need a way include such a verification of this ‘physics context’ directly in the project. My idea here is to use the concept of cryptographically secure digital signatures so that trusted authorities (similar to journals) can sign individual results in Lean projects to verify by authority the parts that Lean does not itself verify.

Good Lean code for Physics

2026-01-05

As interactive theorem provers get increasingly popular, and as AI makes it easier to write correct, and compiling Lean code, I think it is important to distinguish good from bad correct Lean code in the context of physics. This is naturally related to the question of how to detect ‘AI slop’ when it comes to Lean code in physics.

To me good Lean code for physics has, at least, the following features:

Organized as API around key data structures
Has no axioms
Has short moules with a clear purpose
Has good clear and concise documentation (not AI slop)
Has concise proofs
Uses existing formulizations where possible – doesn’t reproduce work already done
Uses a fixed naming convention (following e.g. that of Mathlib)
… probably many more

Citations and AI

2026-01-05

I myself am not a researcher in AI, however, I work in interactive theorem provers, which is a hot-topic in the AI world, and thus interact with a lot of people and companies in the AI sphere.

Really since the start of the AI boom there has been discussions about companies’ use of data to train their models, whether that be reddit posts, books, copy-righted works of art, or closer to home for me academic papers or formalized definitions and proofs. The law sides with the AI companies here, saying that the use of such training data is “fair-use”. However, there is another question here: is it academic misconduct not to cite the training data you (or someone else) used to train the AI model you used to help write your paper, or come up with your theory? I think many would fall back to the above “fair-use” ruling on this and say no. However, citations have nothing to do with copyright. They are about crediting work that has been done, make it easier to set precedents and have been so fundamental to scientific progress that it is drilled into students from very early on.

It is my opinion that if you writing an academic paper or coming up with a theory with the help of an AI, and you are not citing the training data you (or the AI makers) used, you are not working to the level of academic integrity that for some many decades with have prescribed to.

The real problem is that asking for this level of academic integrity takes away many AI companies profit mechanisms.

(My views here are totally my own, and do not represent any position of projects I work on etc.)

Axiomization vs Formalization

2025-12-23

Since this keeps coming up in different scenarios, I thought it was worth writing something about (this is also discussed in one of my papers):

Formalizing results from physics in to Lean or another ITP does not necessarily imply the axiomization of those results. If it did, the amount of material one could formalize into an ITP from physics would be extremely limited. In the same way that a physics paper does not necessarily need to start with an axiomization of the area of physics, a formalization does not either. As long as the starting point is well-defined and can be mathematically formulated, then we are good - and you still get a lot of the benefits from Lean.

It is this approach which allows us in PhysLean to formalize areas related to quantum field theory, the Higgs boson, high-level results in classical mechanics and special relativity etc. We pick a starting point, and then derive results from that starting point - just as a physics paper does. Those starting points can be interconnected to each other, and could be refactored, and moved further and further back until I meets an axiomization (or multiple), if indeed possible.

Datastructures and physics

2025-12-15

One of the things which I found interesting from formalising Wick’s theorem (the precursor to Feynman diagrams), is how many of the ‘complex’ things physicists work with can be reduced to very simple data structures in computer science, and how the precise representation of the data structure we reduce them to can massively effect how easy it is to work with them or think about them.

Take for example, Wick contractions, which are often taught in a Master’s level quantum field theory course. In computer science they essentially correspond to unordered pairings of indices in a list where each index is in at most one pair. There are lots of ways one could represent this data structure in a program like Lean, from inductive types based on the inductive definition of a list, to involutions from the indices of the list back to itself. After lots of trail and error the definition I ended up with is the one shown below. It may not be the prettiest definition in the world, but it was the one that was easiest to work with, and to build and API around.

Wick Contraction

Trust in academic papers

2025-11-29

For a long time academia has relied on an element of trust. This is especially in areas like physics or the natural sciences where papers do not give every nitty-gritty detail of proofs, calculations, or analyses.

When reading papers in these disciplines we are putting our trust in the authors that they have checked the proofs, calculations or analyses, and that they are not spreading misinformation (or even worse disinformation).

As AI have become more prevalent, it is up for debate whether the quality of information in papers has decreased. I personally would argue it has, but I think something which less contentious is that: the ability to trust papers and their content has gone down.

There are ways to increase the trust we have in papers, these include things like peer-review, although I would argue that in some areas this doesn’t put as much trust back into papers as one would naively hope. Another way is through some sort of verification methods, like interactive theorem provers.

Physics and mathematics context

2025-11-29

In some of my talks and papers I try and articulate how I view the difference between mathematics and physics, from the point of view of programs like Lean.

In both mathematics and physics there are definitions, equations, theorems and proofs. Each such result has its “mathematical context” - the mathematical structures that make it up and the way that those structures are arranged to make a mathematically well-defined statement. For example, the mathematical context of the statement ‘F = m * a’ is that it is a propositional equality between two vectors and a non-negative real number.

Mathematics is about the beauty or complexity of said the mathematical context, or the insights which the mathematical context gives you.

However in physics, each result carries, in addition to its mathematical context, physical context. For example in ‘F = m * a’, the vector ‘F’ describes a force, and ‘a’ an acceleration. Physics cares about both the physics context of a result and the mathematical context, and importantly, how they interact with one another.

To me this distinction between maths and physics clearly distinguishes them as distinct subjects, and goes against the “spectrum” between physics and maths that is sometimes portrayed. It explains why, for example, maths papers are split into definitions and theorems, whilst physics papers are written in prose.

Interactive theorem provers (ITPs) are built around the mathematical context of results, and they help us ensure that the mathematical context is correct. However, ITPs don’t say anything about the physics context of results. In the project PhysLean, we are trying to include that physics context through organization, documentation, names of lemmas and theorems, and in the style of proves.