Celly
Celly

Reputation: 325

Making of Genetic algorithm

Im just learning about genetic algorithm when i was given a task to design a genetic algorithm that learns rules that predicts if a person will vote yes or no given a data set.

I've been reading in book and internet about GA and GP for 2 days straight. So now i kind of understand the concept of GA about the population management, genetic operators, fitness functions and crossover with the different types of crossover masks. But i'm still nowhere near making my own GA for a given data set. I just don't get how to start or with what and i'm kind of getting desperate since i get a feeling i'm to stupid for this.

So any kind of help, such as hints, tips or pseudo code, will be much appreciated!

The given data set is as follows (groups):

G1 | G2 | G3 | G4

A1 | B1 | C1 | None

A2 | B2 | C2 | D2

A3 | B3 | C3 | D3

A4 | B4 | C4 | D4

A5 | - | - | D5

Well the data are not a,b,c's. They are something else much longer, but i'm kind of lazy so yea :P The - means there is no more attributes. Note that none is an attribute. Thanks for any sort of help guys!

Upvotes: 0

Views: 726

Answers (2)

user849924
user849924

Reputation: 318

First, and foremost, you'll have to determine what you're trying to solve with your data set in the first place. You generally use a genetic algorithm to tackle non-deterministic problems: problems that take a long time to solve, but whose answers are easily verifiable.

So the first question is: what does your dataset represent?

The second question: what are you trying to solve and is a genetic algorithm a fitting method to solve your problem?

Anyway, creating a genetic algorithm is done through the following steps:

  1. Represent the problem variable domain as a chromosome of a fixed length, choose the size of the population N, the crossover probability p(c) and the mutation probability p(m)
  2. Define a fitness function f(x) to measure the performance, or fitness, of an individual chromosome in the problem domain. The fitness function establishes the basis for selecting chromosomes that will be mated during reproduction
  3. Randomly generate an initial population of chromosomes of size N: x1, x2, ..., xn
  4. Calculate the fitness of each individual chromosome: f(x1), f(x2), ..., f(xn)
  5. Select a pair of chromosomes for mating from the current population. Parent chromosomes are selected with a probability related to their fitness. Highly fit chromosomes have a higher probability of being selected for mating than less fit chromosomes.
  6. Create a pair of offspring chromosomes by applying the genetic operators - crossover and mutation
  7. Place the created offspring chromosomes in the new population
  8. Repeat step 5 until the size of the new chromosome population become equal to the size of the initial population N
  9. Replace the initial (parent) chromosome population with the new (offspring) population
  10. Go to step 4 and repeat the process until the termination criterion is satisfied.

So, you have to find a notation for your solution (such as an array of bits or a string) that allows you to swap parts of chromosomes easily. Then you have to identify the crossover and mutation operations. If you're dealing with ordered chromosomes, then depending on the applied crossover strategy you may have to repair your chromosomes afterwards. An ordered chromosome is a chromosome where the order or the genes matter. If you preform a standard crossover on two solution that represent the cities that the travelling salesman has to visit, you might end up with a chromosome where he visits some cities twice or more and some not at all!

There's no clear description on how to translate each problem in a genetic algorithm, because it's different for each problem. The above steps don't change, but you may need to introduce several different crossover and mutation operations to prevent premature convergence.

Upvotes: 1

Sandor Murakozi
Sandor Murakozi

Reputation: 4412

Well, I do not fully understand the description of the dataset, so my answer is based on the following assumptions: We have a set of attributes, say n different one. Each attributes have a set of different possible symbolic (=non numeric) values, say m(i) different possibilities. Each person have the same attributes, but some of them might be missing or None.

If these assumptions are correct and the set of attributes and possible values are not too high, then one of these might work:

  • if these two sets are really small you could have an n dimensional array as an individual/genotype. Each dimension would have the size m(i) and each value of this structure would be the yes/no answer. It would be the generalization (=more dimensions) of a fixed size (bit) vector. How to create random/mutate/crossover should be easy. Fitness would be how often it makes a good prediction.

  • if they are bigger then you will need something more complicated. One possibility is to have lists of rules. Each rule could be an vector with length n + a yes/no flag. In each position of the vector you would have a possible value of the related attribute. You could also have a jolly joker attribute accepting everything. Interpretation of a rule (p:person, r:rule) : if p1=r1 and p2=r2 and ... pn=rn then the result is the flag of the rule. You will have to evaluate rules until you find a matching one. You will also need a default. Genetic operators are a bit more tricky in this case, but I think you will find something if you search for variable length encoding. I've used a similar encoding (for a different problem) and it worked fine.

  • to make it more general (but also more complicated) you can represent your rules as trees where the internal nodes are and/or/not and possibly other logical operators, leafs are predicates like pi=ri. This would be a kind of genetic programming, google for that if you like this solution.

To be honest I'm not 100% sure if a genetic algorithm is the best choice for this problem, especially if the values are not symbolic, but numeric. It seems to be a pattern matching problem, and for that there are much better solutions. I would look for some alternatives, e.g. neural networks in the numeric case.

Upvotes: 0

Related Questions