Genetic algorithms

A search process borrowed from biology.

You have conviction. This is where it gets tested. The search covers millions of signal combinations that no researcher covers by hand.

The mechanism

How the genetic algorithm searches

Say you're watching 200 signals. Each can be transformed, lagged, and combined with every other - millions of possible combinations. Grid search can't cover that space. Manual selection is limited to what the researcher can imagine.

A genetic algorithm evolves populations of signal combinations. They cross and mutate - swapping lookback windows, introducing variables no parent carried. Each generation is scored against holdout data. What doesn't survive dies off.

Your signals are the genes. The search recombines them across generations into combinations that didn't exist before the search began, and because every component has a name, you can read what survived and ask whether the mechanism makes sense.

What survives is hard to vary - every detail in the signal does necessary work. Change one element and the edge breaks. The result isn't a model you'd design. It's one the data selected.

Question one

What should you be optimizing?

This is the question most traders skip. You have a read on the market - but what exactly are you predicting? A regime shift? A mispricing caused by a forced participant? A spread that compresses because of a structural flow? These are different questions. The same data answers them differently, and the one you'd pick by instinct hasn't been tested against the alternatives.

On a recent engagement, we tested five target definitions against the same dataset. Four produced no out-of-sample signal. The one that survived wasn't the one the client expected.

How the search answers this

Populations evolve against different objectives - regime, mispricing, flow, compression. Most traders only test one. The search tests all of them.

Question two

What's signal and what's noise?

Most of what you're watching is the same three or four economic forces measured twenty different ways. Your model treats each one like independent information. It isn't.

The result is a model that's overconfident, because it sizes positions based on a conviction level that doesn't exist. Twenty inputs that agree aren't twenty confirmations. They're one signal echoed twenty times.

How the search answers this

The survival criteria penalizes redundancy. Signal combinations evolve toward independence - hundreds of candidates compress to the handful that survive. Each survivor represents a different economic force.

Question three

The signal you already threw out.

You've watched a signal stop working. Returns degraded, conviction faded, you moved on. But the mechanism behind it never changed. The market still behaved the way the signal predicted. What changed is that enough participants were trading it that the profit compressed to zero.

The signal wasn't wrong. It was crowded. The information it carried - about positioning, about market structure, about who's on the other side of the trade - is still there. It just answers a different question than the one you were asking.

Most of the edge you've abandoned isn't dead. It's pointed at the wrong target.

How the search answers this

The search re-tests discarded signals against new target definitions. A signal that stopped generating returns might still predict positioning, flow, or structural shifts. The algorithm finds where it's pointed now, not where it was pointed when you gave up on it.

Question four

When does it break?

If the signals your model relies on stopped working today, how long would it take you to notice? Not whether it would happen - it will. How long between the signal dying and you realizing it.

Most traders find out after months of unexplained underperformance. By then the drawdown is deep and the conviction is gone.

How the search answers this

The process re-runs on new data. When what mattered stops mattering, signal importance shifts and fitness degrades. You see the regime change in the search before you see it in the P&L. Edge expires. The question is whether you find out from the data or from your P&L.

The model doesn't find edge.
The signals do.

Once the search identifies what to optimize, which signals survive, and how the regime shifts - the choice of model is almost trivial. A neural net, a linear regression, a simple threshold rule. They all work when the upstream decisions are right. They all fail when they're wrong.

Most people start with the model. We start with the question.

If the edge is real, you'll know. If it isn't, we'll say so.

If you want to test it, here's how it works. See services

Start a conversation