What Does the Fitness Function Do?
The fitness function is the basis of the “survival of the fittest” premise of genetic algorithms. It is responsible for evaluating the parameter set, and choosing which parameter sets mate. The most difficult part of the fitness function is designing the function to produce parameters that are reliable and effective on data outside of the training set.
It helps to consider nature’s fitness function, we are the result of millions of years of genetic optimization, yet do not retain the brawn of a gorilla, nor the size of a sauropods (dinosaur that weighed 209 tons), nor the predatorial skills of a Tyrannosaurus. A genetic function does not just optimize for the strongest creature, but for the creature that can survive and thrive in all circumstances. Dinosaurs were clearly at the top of the food chain and thriving 65 million years ago, but were easily outlived by insects for their ability to survive the harsh aftermath of the Cretaceous-Tertiary extinction event. (Can you tell I have been researching a lot about dinosaurs since starting this blog?).
My point is that you need a fitness function which results in a set of parameters that performs well during a bull run, bear run, and also survives a market crash. A parameter set that makes a fortune on rallies, but bleeds on sideways patterns and reversals is no better than the dinosaurs, ultimately they will perish, taking a lot of your equity with them.
What Makes a Good Fitness Function?
A fitness function can be as simple as the profit generated by running your rules over training data, but this is likely to exploit onetime events in the data, and not to place an emphasis on reliability.
A good fitness function does the following
- Understands Risk – does not evaluate only profit, but how much capital the rules placed at risk to earn that profit
- Punishes Losses Heavily – by punishing the parameter set more heavily for losses than profits, you are training it to focus on consistent profits over volatile returns.
- Punishes High Risk – any rules can earn a lot on a good day by loading up on beta, you want to train your algorithm to seek true alpha.
- Does not punish zero gains – it is important to let your algorithm learn when to enter the market, and when to stay clear. Providing some incentive to simply not take a loss can be just as important as proving incentives to take a large gain.
- Run on a reasonable time frame – A fitness function should evaluate each day (or possibly shorter) of sample data on its own, accumulating the results for a particular parameter set.
Following these guidelines the fitness function must rank each parameter set, and select mates.
Once the parameter sets have been ranked they must undergo selection. The obvious solution would be to only select the top ranked parameters to mate, but this may ignore other minima that lesser parameter sets are exploring.
The chart above illustrates the importance of occasionally exploring lesser ranked parameter sets. The green lines represent the highest ranked parameter sets, but as we can see on the parameter space the red line is at the base of the global minima, while the green lines are just exploring local minima. The best way to allow for this is to select mates with an absolute valued normal distribution. The choice of probability distribution and standard deviation has a large effect on how fast a genetic algorithm converges, an analysis of which will be in a future article. For now the normal distribution proves to be more than adequate.
As you can see the fitness function has a huge impact on the output of your maximization, it defines what the ideal function should do.
Tune in for more Genetic Optimization in Part 4 where I will talk about Training.