Monthly Archive for March, 2007

Intraday Historical Stock Viewer

I have created a simple utility to allow you to view historical intraday data on any stock in Open Tick’s expansive database. It is very basic, but allows for selecting any exchange/symbol/date, and viewing the raw ticks / 1/5/10/15/30/60 minute bars for the stock on that day.

I realize many brokers provide you tools / data to do this (I know Interactive Brokers does at least) but I really wanted something simple and open source that I could hack to bits to put on my own indicators and so forth.

Please let me know if you find this useful or have any problems – it is a C# app that uses opentick’s data base (you must be a registered user) and ZedGraph to draw the graphs.

Below are a series of screenshots to illustrate the features.


Startup screen. On first load you must press “Config” and enter your username / password for opentick.


Enter your username / password in the dialog boxes, and any changes to the opentick servers (nothing should need to change…)


Enter the Exchange / Symbol you are looking for – Open tick’s exchange codes are listed here.
Please note the standard version is configured for west coast users (it downloads from 6:30AM to 1PM). If you are in a different time zone, please modify the source appropriately.


This is the initial chart that will appear (5 minute bars).


This is the raw ticks display (The Volume is just the total volume at any given point in the day.) You can change the timebase by clicking on the combo box at the top left as illustrated above.


The volume bars are synchronized with the share price axis, so if you zoom around, the two maintain sync.

 

Please post a comment if you have any problems, and I will get back to you promptly.

Download Here

Genetic Optimization and Maximization – Fitness Function

This is part 3 in a series on Genetic Optimization, please visit part 1 and part 2 to catch up.

What Does the Fitness Function Do?

The fitness function is the basis of the “survival of the fittest” premise of genetic algorithms. It is responsible for evaluating the parameter set, and choosing which parameter sets mate. The most difficult part of the fitness function is designing the function to produce parameters that are reliable and effective on data outside of the training set.

It helps to consider nature’s fitness function, we are the result of millions of years of genetic optimization, yet do not retain the brawn of a gorilla, nor the size of a sauropods (dinosaur that weighed 209 tons), nor the predatorial skills of a Tyrannosaurus. A genetic function does not just optimize for the strongest creature, but for the creature that can survive and thrive in all circumstances. Dinosaurs were clearly at the top of the food chain and thriving 65 million years ago, but were easily outlived by insects for their ability to survive the harsh aftermath of the Cretaceous-Tertiary extinction event. (Can you tell I have been researching a lot about dinosaurs since starting this blog?).

My point is that you need a fitness function which results in a set of parameters that performs well during a bull run, bear run, and also survives a market crash. A parameter set that makes a fortune on rallies, but bleeds on sideways patterns and reversals is no better than the dinosaurs, ultimately they will perish, taking a lot of your equity with them.

What Makes a Good Fitness Function?

A fitness function can be as simple as the profit generated by running your rules over training data, but this is likely to exploit onetime events in the data, and not to place an emphasis on reliability.

A good fitness function does the following

  • Understands Risk – does not evaluate only profit, but how much capital the rules placed at risk to earn that profit
  • Punishes Losses Heavily – by punishing the parameter set more heavily for losses than profits, you are training it to focus on consistent profits over volatile returns.
  • Punishes High Risk – any rules can earn a lot on a good day by loading up on beta, you want to train your algorithm to seek true alpha.
  • Does not punish zero gains – it is important to let your algorithm learn when to enter the market, and when to stay clear. Providing some incentive to simply not take a loss can be just as important as proving incentives to take a large gain.
  • Run on a reasonable time frame – A fitness function should evaluate each day (or possibly shorter) of sample data on its own, accumulating the results for a particular parameter set.

Following these guidelines the fitness function must rank each parameter set, and select mates.

Mate Selection

Once the parameter sets have been ranked they must undergo selection. The obvious solution would be to only select the top ranked parameters to mate, but this may ignore other minima that lesser parameter sets are exploring.

The chart above illustrates the importance of occasionally exploring lesser ranked parameter sets. The green lines represent the highest ranked parameter sets, but as we can see on the parameter space the red line is at the base of the global minima, while the green lines are just exploring local minima. The best way to allow for this is to select mates with an absolute valued normal distribution. The choice of probability distribution and standard deviation has a large effect on how fast a genetic algorithm converges, an analysis of which will be in a future article. For now the normal distribution proves to be more than adequate.

As you can see the fitness function has a huge impact on the output of your maximization, it defines what the ideal function should do.

Tune in for more Genetic Optimization in Part 4 where I will talk about Training.

Genetic Optimization and Maximization- Encoding

Genetic Optimization / Maximization

This is part two in a series of articles on Genetic Optimization / Maximization. Please visit part1 to get up to speed.

What is it?

Genetic based optimization uses the principals of survival of the fittest in order to determine the ideal parameters. This happens by first creating a random population of parameter sets (or starting with a seed population), and evaluating each parameter set’s performance. The algorithm then “mates” the highest performing parameter sets together, and includes the genetic concepts of mutation and crossover to create “children”. Wikipedia does a great job of covering what Genetic Algorithms are, so I won’t repeat the theory, but will dive into their application to a rules based system.

The basic components of Genetic Algorithm are:

  1. Encoding & Mating
  2. The Fitness Function
  3. Training

I will attack each of these in a different article, and begin here with encoding & mating.

Applying the Genetic Algorithm to our Rules

The basic concept here is that we have to “encode” our parameters in such a way that we can apply crossover and mutation during each mating. This will depend largely on the number of parameters you are tuning for, but there are really only two options: binary encoding or chain encoding.

Binary Encoding

Binary encoding is when you convert all of your parameters to their raw binary form and string them together to effectively create a genome for your parameter set. In our example above, we are using two double types, which encoded into binary form would consist of 128 bits of data. Thus to create the random seed population, a uniform distribution is used to randomly select each of the 128 bits of data. The data is then read back as doubles and each parameter set is tested, and their results are fed into the fitness function to select the parents of the next generation.

The nice part of binary encodings is that they are very easily mated. Once the two parents have been selected (I will cover the fitness function below) the mating is performed with cross over, effectively selecting groupings from each parent, and then with mutation, randomly changing some of the children’s bits. The algorithm for this is pseudocoded below.

Function bit[] BitEncodeMate(bit[] mom, bit[] dad,
double crossoverPercent, double mutationPercent)
{
bit[] child = new bit[mom.length];
bool fromMom = true;
for(int ix = 0; ix < mom.length; ix++)
{
if(RandDouble(0,1) < crossoverPercent)
fromMom = !fromMom;
if(fromMom)
child[ix] = mom[ix];
else
child[ix] = dad[ix];
if(RandDouble(0,1) < mutationPercent)
child[ix] = !child[ix];
}
return child;
}

As you can see, this function is a very simplistic model to mate two parents. I introduced two variables, crossoverPercent, and mutationPercent. These variables are critical to the performance of the genetic algorithm, and are the topic for another blog post. Suffice it to say I am using a simple uniform distribution model (RandDouble), and crossoverPercent will drive how many crossovers occur, and the mutation percent should be quite low (think 5% or below) in order to provide good coverage, but still to converge.

The trouble with the bitwise encoding is that you cannot bound your variables easily. This encoding will randomly cover the entire range of the double variables, including infinity and several uninterpretable(NaN) values. It could be bounds checked after a mate to check validity and simply constrain all of the children to values within acceptable ranges, but as you can see this allows you very little control over the distribution of each parameter.

To summarize, bit encoding provides a very easy way to encode binary data and allows for crossover to occur despite a small parameter space. Unfortunately, bit encoding provides very little control over the ranges of the parameters, nor the distribution that the mutations will take. Despite this, it is still the most common form of encoding, and works much better for integers than for floating point numbers. You can thus constrain the number of bits for a particular integer to constrain its range, but cannot control its distribution.

Chain Encoding

Chain encoding works best for a very high parameter space where large speed gains may be had with bounding each parameter. This is the more typical case for rules based engines. While the function I provided above only has two parameters, it is often much more common to have 10 to 20 parameters which need tuning.

Here the parameter sets are simply encoded as an array of doubles (or whatever datatypes you want), and the crossover operation is taken by parameter rather than by bit. The big advantage now is that mutation can be carefully controlled. If you are able to provide a close starting point, the algorithm may use a normal probability distribution to mutate about the current value. You may also continue to operate on a uniform distribution, and simply constrain each parameter to its range. The following is the pseudocode for a function which mates two parents that are encoded with chain encoding, and whose mutations are constrained to a range.

Function double[] ChainEncodeMate(double[] mom, double[] dad,
double crossoverPercent, double mutationPercent,
double[] maxRange, double[] minRange)
{
double[] child = new double[mom.length];
bool fromMom = true;
for(int ix = 0; ix < mom.length; ix++)
{
if(RandDouble(0,1) < crossoverPercent)
fromMom = !fromMom;
if(fromMom)
child[ix] = mom[ix];
else
child[ix] = dad[ix];
if(RandDouble(0,1) < mutationPercent)
child[ix] = RandDouble(minRange[ix], maxRange[ix]);
if(child[ix] < minRange[ix])
child[ix] = minRange[ix];
if(child[ix] > maxRange[ix])
child[ix] = maxRange[ix];
}
return child;
}

Clearly this code is very similar to that used for binary encoding, but the power to control each parameter can make a dramatic effect on run time (again, this will be analyzed in a later article). This algorithm would have some trouble with narrow parameter scopes though, as in the example above, everytime a parent mates, effectively the child would very closely resemble the parents, or simply conduct a random walk of the parameter space. It is again also important to control the mutationPercent, while we are now in control of the distribution, we do not want to modify all of the parameters very often. As much as possible, we would prefer to allow the mutation to happen in one dimension, and to let the parental variability converge to the solution.

The above chart illustrates the results of 1,674 generations of running against a rules system for one of seven parameters. As you can see, the mutations caused the parameter to be exposed over a large part of its given range (0 to 100) while generally sticking with values it has found to be profitable (hence the vertical piers).

That’s all for encoding and Mating – next up we will cover the fitness function!

Optimization

Rules based Optimization

Before I discuss my second Automated Trading System (ATS), I need to explain the principal in which it operates. A rule based ATS depends on carefully chosen thresholds and parameters to determine when a particular stock should be entered and exited (long or short). Experience and theory can provide an excellent starting point, but to perform really well for a particular stock, it is useful to maximize these parameters on historical data.

Think about your set of rules as though they are a function.

function rules (double shortMATime, double longMATime)
{
“Enter Long when the shortMATime minute moving average crosses above the longMATime minute moving average.”
“Exit when the longMATime minute moving average crosses above the shortMATime minute moving average.”
}

In this example, there are two rules which are executed over live or historical data, with two parameters, shortMATime and longMATime. We would like to select values for these parameters such that they would have made the most money over the last week (or any time frame), assuming this represents closely what values will make the most money tomorrow. This function is very difficult to maximize, as it is not continuous. Small adjustments to either parameter can cause huge swings in the profitability of the system.

In this particular case, you may consider running the entire variable space through the function, setting each parameter from 0 to 1000 minutes, incrementing by one second, and taking the maximum output when you are done. This turns out to be roughly (1000 * 60)2 = 3,600,000,000 runs. Assuming a very fast 5 seconds per run for a week’s worth of data, this would take 570.77 years to process. Clearly a better maximization function is needed, but it cannot depend on the derivative of the function, nor can it require continuity of the first order. This is exactly where genetic algorithms shine.

My next series of articles will cover Genetic Optimization in detail. Stay tuned for more updates.

FTP Image Uploader

I am developing an application to let our wedding guests upload their digital images to a central web site (the disposable wedding cameras in a digital era). In an ideal world, I would setup a website which we would hand out to all of our guests. They would go to the site and simply drag and drop their files onto the server much like Kodak Gallery or Shutterfly, shown below.

Kodak Gallery Easy Uploader

Shutterfly Uploader

As you can see, these options present a very intuitive upload experience. Shutterfly in particular is literally as simple as dragging pictures on, and it immediately begins to asynchronously upload the pictures two at a time, placing additional pictures that you drag in on a queue. This is something within the realm of my family understanding.

The clear advantage of these components is that they are simple to use, and work through your web browser, however, they are a pain to install as shown below.

ActiveX installer

Java applets seem to be the most violent to work with, though they offer the best packages such as radlinks or JavaAtWork. However, I have experimented with both of these packages, and the bottom line is that too many antivirus / malware protection programs disable Java / ActiveX, and ultimately I do not want to have to explain how to use the advanced window under internet explorer to fix these.

I believe this leaves me with writing a custom program, which our guests then download and run. Ultimately, this is making an easy to use – pre programmed – FTP window.

I would love it if I could do this through Flash, but I have yet to see a solution to this problem. I realize sites like box.net seem to have solved the problem, but I believe they too use an active X control.

Let me know if you know of a way around this, or are aware of a better solution, otherwise part II of this article will cover the development of this application.