Overview of Matthew's genetic algorithm library

This document outlines the contents of the library and presents some of the design philosophy behind the implementation. Some source code samples are provided at the end of the page to illustrate basic program structure, operator customization, and derivation of new genome classes.

When you use the library you will work primarily with two classes: a genome and a genetic algorithm. Each genome object represents a single solution to your problem. The genetic algorithm object defines how the evolution should take place. In addition to these two objects you must create an objective function. The objective function tells the GA how well each genome 'performs'.

When you use a genetic algorithm to solve an optimization problem, you must be able to represent a single solution to your problem in a single data structure. The genetic algorithm will create a population of solutions based on a sample data structure that you provide. The genetic algorithm then operates on the population to evolve the best solution. In GAlib, the sample data structure is called a GAGenome (some people refer to it as a chromosome). The library contains four types of genomes: GAListGenome, GATreeGenome, GAArrayGenome, and GABinaryStringGenome. These classes are derived from the base GAGenome class and a data structure class as reflected in their names. For example, the GAListGenome is derived from the GAList class as well as the GAGenome class.

Genetic Algorithm and Genome Operators

The genetic algorithm uses three primary operators to do its work: selection, crossover, and mutation. Selection is the method by which candidate parents are picked from the population. Mating is done using the crossover operator. Mutation occurs to inject new genetic material into the population. The selection method is defined as part of the genetic algorithm object. Crossover and mutation are genome-based.

Each genome has three primary operators: initialization, mutation, and crossover. With these operators you can bias an initial population, define a mutation or crossover specific to your problem's representation, or evolve parts of the genetic algorithm as your population evolves. GAlib comes with these operators pre-defined for each genome type, but you can customize any of them.

The initialization operator determines how the genome is initialized. It is called when you initialize a population or the genetic algorithm. In addition to the genome initialization operators, the population object has its own initialization operator. By default this simply calls the initialization operators of the genomes in the population, but you can customize it to do whatever you want.

The mutation operator defines the procedure for mutating the genome. Mutation means different things for different data types. For example, a typical mutator for a binary string genome flips the bits in the string with a given probability. A typical mutator for a tree, on the other hand, would swap subtrees with a given probability.

The crossover operator defines the procedure for generating a child from two parent genomes. It is used in conjunction with a crossover site object. Like the mutation operator, crossover is specific to the data type.

Each of these methods can be customized so that it is not only specific to the data type, but also to the problem type. This is one way you can put some problem-specific 'intelligence' into the genetic algorithm (I won't go into a discussion about whether or not this is a good thing to do...)

The library has some basic types built in, but if you already have an array or list object, for example, then you can quickly build a genome from it by multiply inheriting from your object and the genome object. You can then use this new object directly in the genetic algorithm.

In general, a genetic algorithm does not need to know about the contents of the data structures on which it is operating. The library reflects this generality. You can mix and match genome types with genetic algorithms. The genetic algorithm knows how to clone genomes in order to create populations, initialize genomes to start a run, cross genomes to generate children, and mutate genomes. All of these operations are performed via the genome member functions.

The genetic algorithm contains the selection method, statistics, replacement strategy, and parameters for running the algorithm. A typical genetic algorithm will run forever. The library has built in functions for specifying when the algorithm should terminate. These include terminate-upon-generation, in which you specify a certain number of generations for which the algorithm should run, and terminate-upon-convergence, in which you specify a value to which the best-of-generation score should converge. You can customize the termination function to use your own stopping criterion.

Genetic Algorithm Types

The library contains three flavors of genetic algorithms. The first is the standard 'simple GA' described by Goldberg in his book. This algorithm uses non-overlapping populations and optional elitism. The second is a 'steady state GA' that uses overlapping populations and custom replacement. In this variation, you can specify how much of the population should be replaced in each generation. The third variation is the 'incremental GA', in which each generation consists of only one or two children. Both the steady state and incremental genetic algorithms allow custom replacement methods to define how the new generation should be integrated into the population.

The base GA class contains operators and data common to most flavors of genetic algorithms. You can derive your own GA class if you want to do a special kind of variation.

The Population Object and Fitness Scaling

The population object is a container for genomes. Each population object has its own initializer (the default simply calls the initializer for each individual in the population) and evaluator (the default simply calls the evaluator for each individual in the population). It also keeps track of the best, average, deviation, etc for the population.

Each population object has a scaling scheme object associated with it. The scaling scheme object converts the objective score of each genome to a fitness score that the genetic algorithm uses for selection. It also caches fitness information for use later on by the selection schemes.

It is important to note the distinction between fitness and objective scores. The objective score is the value returned by the objective function. The fitness score is the (possibly scaled) value used to determine fitness for mating. They are not necessarily the same! For example, if you use linear scaling then the fitness scores are derived from the objective scores using the fitness proportional scaling technique described in Goldberg's book. The genetic algorithm uses the fitness scores, not the objective scores, to do selection.

You can evaluate the individuals in a population using an individual-based evaluation function (which you define), or a population-based evaluator (also which you define).

So what does it look like in C++?

A typical optimization program has the following form:
main(){
  GA2DBinaryStringGenome genome(width, height, objective);     // create a genome
  GASimpleGA ga(genome);                            // create the GA
  ga.evolve();                                      // evolve the GA
  cout << ga.bestOfAll() << endl;                   // print out the best solution
}
You can very easily change the behaviour of the genetic algorithm by setting various parameters. Some of the more common ones are set like this:
  ga.populationSize(popsize);
  ga.nGenerations(ngen);
  ga.pMutation(pmut);
  ga.pCrossover(pcross);

  GASigmaTruncationScaling sigmaTruncation;
  ga.scalingScheme(sigmaTruncation);
And a typical (albeit simple) objective function looks like this (this one gives a higher score to a 1D binary string genome that contains all 1s):
float
objective(GAGenome & g)
{
  GA1DBinaryStringGenome & genome = (GA1DBinaryStringGenome &)g;
  float score=0.0;
  for(int i=0; i<genome.length(); i++)
    score += genome.gene(i);
  return score;
}
When you write an objective function, you must first cast the generic genome into the type of genome that your objective function is expecting. From that point on you can work with the specific genome type. Each objective function returns a single value that represents the objective score of the genome that was passed to the objective function.

Please see the examples for more samples of the library in action. And see the programming interface page for a complete list of member functions and built-in operators.

What about defining my own operators?

Defining the operators is only as difficult as figuring out the algorithm you want to implement. As far as the actual implementation goes, there's not much to it. To assign an operator to a genome, just do this:
  GA1DBinaryStringGenome genome(20);
  genome.initializationOperator(MyInitializer);
If you do this to the first genome (the one you use to create the genetic algorithm) then all of the ones that the GA clones will have the same operators defined for them.

The crossover operator is passed three genomes: the child and its two parents. The mutation operator is passed a genome and a mutation probability. It returns the number of mutations that occurred. The initialization operator is passed a single genome. In each case, the genomes are generic; each operator must cast the generic genome into the type that it is expecting before it does its operations.



The definition for the ListPartialMatchCrossover looks like this:
This crossover operator is used on order-based lists.
void
GAListPartialMatchCrossover(GAGenome & c, const GAGenome & a, const GAGenome & b)
{
  GAListGenome<LIST_TYPE> &child=(GAListGenome<LIST_TYPE> &)c;
  GAListGenome<LIST_TYPE> &mom=(GAListGenome<LIST_TYPE> &)a;
  GAListGenome<LIST_TYPE> &dad=(GAListGenome<LIST_TYPE> &)b;
  GAListMatchCrossSite<LIST_TYPE> *xsite = 
	(GAListMatchCrossSite<LIST_TYPE> &)child.crossoverSite();

  if(xsite.a == gaNoXSite || xsite.b == gaNoXSite) xsite.pick(c);

  child.GAList<LIST_TYPE>::copy(mom);
  GAListIter<LIST_TYPE> iter(dad);

  LIST_TYPE *index = iter.warp(xsite.a);
  for(int i=xsite.a; i<xsite.b; i++, index=iter.next()){
    if(*child.head() == *index){
      child.swap(i,0);
    }
    else{
      for(int j=1; (j<child.size()) && (*child.next() != *index); j++);
      child.swap(i,j);	// no op if j>size
    }
  }

  child.head();		// set iterator to head of list
}


The definition for 1DArrayFlipMutator looks like this:
This mutator flips the value of a single element of the array to any of the possible allele values.
int 
GA1DArrayFlipMutator(GAGenome & c, float pmut)
{
  GA1DArrayGenome<ARRAY_TYPE> &child=(GA1DArrayGenome<ARRAY_TYPE> &)c;
  register int n, i;
  if(pmut <= 0.0) return(0);

  float nMut = pmut * (float)(child.length());
  int na = child.alleles().size()-1;
  if(nMut < 1.0){		// we have to do a flip test on each bit
    nMut = 0;
    for(i=child.length()-1; i>=0; i--){
      if(GAFlipCoin(pmut)){
	child.gene(i, child.alleles().allele(GARandomInt(0, na)));
	nMut++;
      }
    }
  }
  else{				// only flip the number of bits we need to flip
    for(n=1; n<nMut; n++){
      i = GARandomInt(0, child.length()-1); // the index of the bit to flip
      child.gene(i, child.alleles().allele(GARandomInt(0, na)));
    }
  }
  return((int)nMut);
}


And the definition for a typical initializer looks like this:
This initializer creates a tree of bounded random size and forkiness.
void
TreeInitializer(GAGenome & c)
{
  GATreeGenome<Point> &tree=(GATreeGenome<Point> &)c;

  tree.root();
  tree.destroy();   // destroy any pre-existing tree

  Point p(0,0,0);
  tree.insert(p,GATreeBASE::ROOT);
  int n = GARandomInt(0,MAX_CHILDREN);	// limit number of children
  for(int i=0; i<n; i++)
    DoChild(tree, 0);
}

void
DoChild(GATreeGenome<Point> & tree, int depth)
{
  if(depth >= MAX_DEPTH) return;     // limit depth of the tree
  int n = GARandomInt(0,MAX_CHILDREN);	// limit number of children

  Point p(GARandomFloat(0,25),GARandomFloat(0,25),GARandomFloat(0,25));
  tree.insert(p,GATreeBASE::BELOW);

  for(int i=0; i<n; i++)
    DoChild(tree, depth+1);

  tree.parent();		// move the iterator up one level
}


What about deriving my own genome class?

Here is the definition of a genome that contains an arbitrary number of lists. It could easily be modified to become a diploid genome. It is used in exactly the same way that the built-in genomes are used. For a simpler example, see the GNU example which integrates the GNU BitString object to form a new genome class.
class RobotPathGenome : public GAGenome {
  GADefineIdentity("RobotPathGenome", 200):

public:
  RobotPathGenome(int nrobots,
		  GAObjectiveFunction f=NULL, void * u=NULL, GA * g=NULL) :
  GAGenome(RobotPathInitializer, RobotPathMutator, RobotPathCrossover, &__robotPathCrossSite){
    of=f; ud=u; ga=g; n = nrobots;
    list = (n ? new GAListGenome<int> * [n] : NULL);
    for(int i=0; i<n; i++){
      list[i] = new GAListGenome<int>(of,ud,ga);
      list[i]->initializationOperator(ListInitializer);
      list[i]->mutationOperator(ListMutator);
      list[i]->crossover(ListCrossover, __listCrossSite);
    }
  }
  RobotPathGenome(const RobotPathGenome & orig)
    {n = 0; list = NULL; copy(orig);}
  RobotPathGenome operator=(const GAGenome & arg)
    {copy(arg); return *this;}
  RobotPathGenome operator=(const RobotPathGenome & arg)
    {copy(arg); return *this;}
  virtual ~RobotPathGenome(){
    for(int i=0; i<n; i++)
      delete list[i];
    delete [] list;
  }
  virtual GAGenome *clone(const GACloneMethod flag=gaCloneContents) const {
    RobotPathGenome * cpy = new RobotPathGenome(0);
    cpy->GAGenome::copy(*this);
    cpy->n = n;
    cpy->list = new GAListGenome<int> * [n];
    for(int i=0; i<n; i++)
      cpy->list[i] = (GAListGenome<int> *)list[i]->clone(flag);
    return cpy;
  }
  virtual void copy(const GAGenome & c){
    if(&c == this) return;
    RobotPathGenome & rc = (RobotPathGenome &)c;
    GAGenome::copy(c);
    for(int i=0; i<n; i++)
      delete list[i];
    delete [] list;
    n = rc.n;
    list = new GAListGenome<int> * [n];
    for(i=0; i<n; i++)
      list[i] = (GAListGenome<int> *)rc.list[i]->clone();
  }

  friend ostream & operator<<(ostream & os, RobotPathGenome & arg)
    {arg._write(os); return(os);}
  friend istream & operator>>(istream & is, RobotPathGenome & arg)
    {arg._read(is); return(is);}

  GAListGenome<int> & path(const int i){return *list[i];}
  int npaths() const {return n;}

protected:
  int n;
  GAListGenome<int> **list;
  void _read(istream & is){
    for(int i=0; i<n; i++)
      is >> *(list[i]);
  }
  void _write(ostream & os){
    for(int i=0; i<n; i++)
      os << *(list[i]) << "\n";
  }

private:
  RobotPathGenome(){}
};
Please see the examples for complete details.
28 September 1995