BaluJr.
BaluJr.

Reputation: 1080

Genetic Algorithm Chromosome Design for Integerinterval not equal to power of two

I have a rather obvious problem and I am a little bit confused, that I couldn't find any answer in the internet:

Binary representation of a discrete phenotype within the chromosome appears to be the norm. But all explanations, as for example the wikipedia page, just mention, that you can represent 2^l phenotypes by a chromosome of length l.

But what is the best strategy if I have for example 300 phenotypes? Then a lot (512-300=212) of my possibly generated chromosomes are wrong at all. How to handle such an Interval with an upper bound?

For me this problem is even more relevant, since I have many smaller dimensions which I stick together. If each of my 10 dimensions has for example size 33, there are 10 bits in my final chromosome, which can only be 1, if all other bits of that dimension are 0.

Thank you in advance!

Upvotes: 1

Views: 89

Answers (3)

BaluJr.
BaluJr.

Reputation: 1080

Finally I want to let you know, what I did in my project and give some publications for verifiction.

First to the encoding: While both Gray encoding and binary encoding have been considered, the work of Rothlauf ("The Influence Of Binary Representations Of Integers On The Performance Of Selectorecombinative Genetic Algorithms." Rothlauf 2012) underlined the inherent advantage of the binary representation.

The more severe problem on the other hand, was to find a proper strategy to represent n integers by m bits, which could originally represent 2^m integers. Even when taking the least possible amount of bits, there are 2^m-n chromosomes, which do not represent a valid index. To target this problem, two different strategies arise (See: "On methods for discrete structural optimization" Ringertz 1988, "Genetic algorithms in optimization problems with discrete and integer design variables" Lin 1992, "Methods for optimization of nonlinear problems with discrete variables: a review" Arora 1994}):

  • Penalty approach: Following this approach, all invalid genomes are assigned a very low fitness. This causes them to disappear with high possibility.
  • Excessive distribution approach: In this approach, the invalid chromosomes are assigned to valid phenotypes by a predefined mapping strategy. A natural associated disadvantage is the resulting uneven distribution between the valid values, since $2^m-n$ of the $n$ genes will have two representations instead of only a single one. This problem can be reduced by using even more additional bits for the chromosome. The more bits one takes, the more equally, the distribution will be. Easy to understand: Eg in an extreme case 1024 possible chromosome configurations distributed over 15 solutions will lead to a more equal distribution then 16 possible configurations, where only 1 solution gets a second representation.

For the support of multiple dimensions, the genes of all dimensions have been concatenated.

Upvotes: 0

zegkljan
zegkljan

Reputation: 8419

There are two things - genotype and phenotype.

Genotype is the thing you manipulate by the evolution. These are the 1s and 0s but it can be whatever you want.

Phenotype is the thing you actually evaluate for fitness, the thing you ultimately want to optimize, e.g. a path in TSP.

Actually, I lied, there are three things. The third one is genotype-phenotype mapping and it stands between the former two things. This mapping's responsibility is to produce a phenotype for each possible genotype.

In the end, it's about the design of the mapping and consequently the genotype so that it does what you need. You say you have 300 possible phenotypes? And you want to use binary representation? Well, design you mapping that it maps 512 objects onto 300 objects. However you feel it is ok. Done.

General rule of thumb is that the mapping should "scramble" the genotype the least possible. I.e. it should, as most as possible, allow the fact that small changes in genotype produce small changes in phenotype.


A small note: why do you search the space of size 300 by GA? Couldn't it be done by exhaustive search?

Upvotes: 2

Ray
Ray

Reputation: 3109

Binary representation of a discrete phenotype within the chromosome appears to be the norm

I don't think that's true. And even if it was...who cares. You don't have to use a binary representation.

You didn't specify the problem you're trying to optimize. Each problem will have a different form of representation. Use whatever is useful.

Upvotes: 1

Related Questions