← Back to Field Notes

Local Minima: The Training Plateau Problem

Sun Mar 29 2026

The personal best is not the true best.

Sometimes in physical training, we achieve personal records we did not think were possible, by accident, unprepared. In reverse, intentional attempts to beat existing records fail, even with increased training, discipline, and volume. A common factor in both situations is the landscape. The first escaped it with a large leap. The second is playing by the rules of a surface that has already flattened. A machine learning concept explains this clearly: local minima.

In machine learning, training a model means attempting to reduce loss toward zero. Think of it as a vast, uneven terrain: mountains, valleys, and slopes, with the model somewhere on that terrain trying to find the lowest point where errors are smallest. The algorithm that does this is gradient descent. It works the way water finds its level: it checks the slope immediately around it, moves in the steepest downhill direction, and repeats.

The problem is that there are usually multiple valleys, and the goal is not just any valley but the lowest one. A local minimum is a point where every immediate direction looks uphill, while a better solution exists elsewhere on the surface.

Early in training, gains are often fast because the body is still changing. Every session offers a clear downhill direction. Then the same workouts stop producing change: the surface has flattened, the body has adapted to the current stimulus, and it has found a local equilibrium. It is not at its ceiling. It is at the ceiling of the current configuration. The programme that produced initial gains has now become the constraint on further gains. The body found the best version of itself those inputs can produce. Asking it to do more of the same is like running gradient descent longer on a flat surface. The slope is gone. There is nothing to follow.

Plateau ⇒ ∇θ L(θ) ≈ 0 at a local, not global, minimum

In training terms, this means improvements from the same exercise pattern taper toward zero.

In machine learning, a direct way to escape local minima is to take larger steps that overshoot the local valley and land in another region, or to introduce noise into descent so the system can stumble out of shallow traps.

Changing rep range, movement pattern, training frequency, or load scheme repositions the body on a different part of the loss landscape, one where the surface is still steep and descent is still possible. The body encounters inputs it has not optimised for and is forced to adapt again. Progressive overload is gradient descent with a schedule. It prevents full flattening by continuously shifting the target. The moment adaptation catches the stimulus, the stimulus moves. The body is not allowed to fully converge. This is why programmes without progression eventually produce the same result as no programme at all. The gradient disappears, and the system settles.

The sharper insight is about the relationship between disruption and progress. Most people try harder instead of trying different. Effort applied inside a local minimum does not escape it. Only a change in direction, a perturbation large enough to displace the current position, can relocate the body onto a slope where adaptation becomes possible again.

θnew = θ + ε, where ε is large enough to escape the local basin

Here, ε could be a new movement that trains the same muscle from a different angle. It could be a rep range the body has not recently encountered, or a deload followed by a novel stimulus. The specific form matters less than the principle: the current position must be abandoned before a better one can be found.

The body is an honest optimiser. It descends whatever gradient it is given, and converges wherever that gradient runs out. If the inputs stop producing slope, the body stops moving. Give it a new hill to descend.