Reputation: 6697

Genetic algorithm - Subset sum problem

I have to do a project which using a genetic algorithm solves the subset sum problem. Unfortunately, when coding the algorithm I found a big problem ...

My algorithm:

as long as no solution was found and the number of steps is smaller than steps do:
calculate the probability and then distribution function for each chromosome
perform selection (roulette)
select n chromosomes to be crossed
perform the crossing (the crossing point is selected randomly)
select m chromosomes to mutation
perform mutations
if you found a solution then stop

(Algorithm was taken from the book "Genetic Algorithms + Data Structures = Evolution Programs, Chapter 2 ") Variables such as population size, amount of data, scope of data collection, number of steps, the number of mutations (in step), the number of crossings (in a step) is set rigidly in the program options.

The problem is that after a certain (relatively small) number of steps in the population all the chromosomes are identical. The problem illustrates this graph: http://imageshack.us/m/96/7693/wykresb.png

What I'm doing wrong? How to fix it? Thanks in advance.

Edit:

Here You can find logs from my app: http://paste.pocoo.org/show/391318/

I think that roulette is not the best solution (as deong said). Mutations also need to improve.

Upvotes: 1

Answers (3)

AJed

Reputation: 578

I had a similar problem before, I wish it s the same as your's

First, you need to check (using any measuring metric) if chromosome A is better than chromosome B. This let you have a strict order of the chromosomes of your population and be able to sort your population.

Then, when you produce a new chromosome (either by mutation or crossover) you may be producing a chromosome that already exist in your population. Make sure not to include this in your population list.

In other words, make sure your list always contains different chromosomes and always sorted from best to worst !

Note: The genetic algorithms I work with are usually like this (this is the most general algorithm and most used):

create P different chromosomes and add them to list Pop;
1. while (no optimal solution is found && number of iteration < LIM)
2. create new chromosomes using crossover, mutation or any other methods;
3. add the created chromosome to list Pop
4. sort the list Pop (from best-fit to worst-fit)
5. select the first P different chromosomes and discard all other from Pop.
6. end while

Upvotes: 1

deong

Reputation: 3870

Here's (potentially) the problem. Disclaimer is of course that you may just have a buggy program.

Roulette wheel selection is just terrible. The problem is that early on in a run, the distribution of fitness values is random. You have some awful solutions and some that are reasonable OK in comparison. You don't expect any of them to be very good, but you would expect some of them to be much better than others.

Roulette wheel selection takes these relative differences in probabilities and amplifies them. If you have a population size of 100, and one individual has a fitness five times better than any others, it will be selected five times as often. With typically mild mutation rates, you end up quickly in a situation where you choose the same individual twice for recombination, produce some new identical offspring, make very minor changes (maybe), and then put them back into the population. Because you're still early in the run, most solutions are still bad, so where you did have one above average solution, you selected it into five above average solutions, bred them to get ten above average solutions, and then started the process all over again. These solutions can very quickly take over the entire population if you aren't really careful with designing your set of operators, even though all the algorithm knows is that they're better than the really crappy solutions it has otherwise seen.

The solution is to use a better selection operator. Binary tournament selection is faster, easier to code, and applies a much more tolerable selection pressure. There is also rank-biased selection which selects proportionally by fitness ranking rather than absolute differences.

Edit: This isn't to say you can't use proportional selection. Just that it is very prone to premature convergence and to use it effectively, you typically have to build an entire set of operators with that in mind.

Upvotes: 3

Christian

Reputation: 4375

When applying genetic algorithms, it can happen that the algorithm gets stuck at local optima. However, one is interested in the global optimum (or rather an approximation to such an optimum).

Local optima can be avoided by:

A higher mutation rate
A different cross-over function

Moreover, it may be useful to kill clones. That means you "quickly" look at your population after each iteration and do not allow for clones. By quickly I mean that you just look for approximate clones, because checking for exact clones would take O(m*n^2), where n is your population size and m is the size of a chromosome. This method helped me in a different problem where I was facing clones as well.

Hope this helped, Christian

EDIT

It would also be nice if you could post your cross-over function. Preferably not as code, but in plain english text. The cross-over function is the critical part of a genetic algorithm.

Upvotes: 1

Genetic algorithm - Subset sum problem

Answers (3)

Related Questions