Uni work has been non stop at the moment. It’s a funny thing when you see several other groups of people coming into the same labs to work on assignments on weekends and consultation week. This weeks flavour of work for most was either Enterprise Java (I cannot wait till this one is over) and AI.

I’ll leave my thoughts on Enterprise Java for another time (because it will most likely turn into a rant ;)Current explorations into how genetic algorithms work has been quite interesting. The idea of borrowing ideas from nature (evolutionary pressure, recombination and mutation) to solve problems is an interesting one. Especially for us programmers :) My initial thoughts behind creating a sudoku solver was something like:

  • Sudoku class: to have a main Sudoku class to handle file reading (sudoku puzzels), displaying the solution etc
  • SudokuAlgorithm interface: providing a generic solve(Problem) interface for different algorithms to implement
  • Problem class: stored the base problem (an 81 integer array (9×9) along with some other flags like ‘isSolved’, ‘hasSolution’ etc
  • SudokuException: Sudoku exceptions :)
  • Population class: A population class for things like InitPopulation, GetFittest, Mutate, Evaluate
  • GA class: actual implementation of a GA

When I further started looking at thigs like repairing, different types of recombination (1 point, 2 point, 3 point…..cyclic, random), different types of selection processes (best, tournament, random) I thought it was about time for a bit of an overhaul and re-design.

So, classes got dropped and classes got added to cater for newer concepts and to hold abstract concepts a little better. More interfaces were added like IRecombination to allow a basic GA, or suped up GA to use different types of Recombination classes (crossover, cyclic, random and so on).

Some of the biggest mentions include the Gene and Chromsome classes. The basic int[81] array got replaced with a Gene class that could hold some extra information other then its value, like whether that gene was read only (a given in the sudoku board). So, using arrays of genes was a step further. It wasn’t until I started looking further into the process of beefing up the fitness(evaluation) function and a few other things that I realised it might be handy to go another step and add a Chromosome class, that would hold an array of 81 Genes.

This Chromsome class was a big, big step further. It would be a ’smart Chromosome class’ that could give plenty of information on its gene structure. Rather then it being a dumb array of genes. It would know how to return a row of genes, or a column of genes, or a block of genes from within the chromosome itself. This of course maps the Sudoku problem domain perfectly.

At this point, it is no longer a basic GA. This then made it exceptionally easy to implement a good fitness function. The fitness function looks at the uniquness of each row. column or block (a perfect row, column or block should have exactly 1-9) and assigns it a value. This is done on all rows, to get a total row uniqueness, all columns and all blocks. Overall, representing how fit that chromosome (potential solution) is.

The individual algorithm applied is ((float)(1.0/(10-hashCount))/9.0). When calling NextRow, NextColumn, or NextBlock on the chromosome the Gene array is analysed and each value is added to a hashtable. A pefect sequence would have a count of exactly 9 elements, one with duplicats would have less then 9 elements. Hence, a perfect row would result in (1.0/10-9)/9 or .111111*. Nine perfect rows would give .9999999, as would columns and blocks. In the end, a perfect block, column or row would have a value of 1. The total fitness for a chromsome was then ‘totalRowsFitness * totalColumnsFitness * totalBlocksFitness’. All said and done, a ‘perfect’ sudoku board using the 3 main sudoku rules would have a fitness of 1.Further things were added to the ’smart Chromosome’ class to make repair operations, mutations and the like easier. All in all it incorporates functionality like this…

#region Properties
public Gene[] Genes
public Gene[] NextColumn
public Gene[] NextRow
public Gene[] NextBlock
public List UnderRepresentedGenes
public List OverRepresentedGenes
#endregion#
 
region Public Methods
public void ResetRow();
public void ResetColumn();
public void ResetSquare();
public int GeneFrequency(Gene gene);
public void GeneFrequenceyUpdate();
public void AddGene(Gene gene);
public void SwitchGenes(Gene gene, Gene, gene2);
public bool isUnderRepresented(Gene gene);
public bool isOverRepresented(Gene gene);
public void UpdateGeneFreqList();
public void ReplaceGene(int index, Gene gene);
public void ReplaceGene(Gene toReplace, Gene gene);
public object Clone();

From above, you can see that you can actualy get quite a lot of information from the chromosome and have it do some nice things. When a chromosome becomes unbalanced, it essentialy knows how to balance itself by looking at how many under and over represented genes it has, switching these genes around until equilibrium is achieved.

Because a chromosome knows specific gene informatoin about rows, columns and blocks. Instead of the traditional crossover form of recombination, blocks, columns or rows could be recombined from each parent producing potentially better offspring. Especially if you do some pre analysis and take blocks etc that have a good fitness rating. Some solutions may never be found because a tradional crossover may never get the correct rows, blocks or columns from one part of the board into another part where it is actually needed. This method could help and thus prove to be very powerful inded.

At this point I am now starting to solve basic sudoku puzzels with 20, 30 or 40 missing numbers. The biggest problem I am having is stagnation or convergence. My population often gets to around 60-70% of the solution before effectively cloning or converging together. Whilst mutation does help introduce new gene informatoin back into the pool, it’s often not enough when you have reached total convergence. Unless of course you implemented a form of dynamic mutation that looked at how close the population was from convergence.

Increased mutation as the population converged to a single chromosome point, or backed off mutation when there was a large amount of diversity already. This is something I might look at. All in all, from fitness functions, selection processes, recombination and mutation effects to smart chromosome structures and repair functions. Population sizes to generations running, there are a myriad of things that affect if, and how quickly a GA can get to a solution (if at all).Tech Tags: