Reputation: 325
Im just learning about genetic algorithm when i was given a task to design a genetic algorithm that learns rules that predicts if a person will vote yes or no given a data set.
I've been reading in book and internet about GA and GP for 2 days straight. So now i kind of understand the concept of GA about the population management, genetic operators, fitness functions and crossover with the different types of crossover masks. But i'm still nowhere near making my own GA for a given data set. I just don't get how to start or with what and i'm kind of getting desperate since i get a feeling i'm to stupid for this.
So any kind of help, such as hints, tips or pseudo code, will be much appreciated!
The given data set is as follows (groups):
G1 | G2 | G3 | G4
A1 | B1 | C1 | None
A2 | B2 | C2 | D2
A3 | B3 | C3 | D3
A4 | B4 | C4 | D4
A5 | - | - | D5
Well the data are not a,b,c's. They are something else much longer, but i'm kind of lazy so yea :P The - means there is no more attributes. Note that none is an attribute. Thanks for any sort of help guys!
Upvotes: 0
Views: 726
Reputation: 318
First, and foremost, you'll have to determine what you're trying to solve with your data set in the first place. You generally use a genetic algorithm to tackle non-deterministic problems: problems that take a long time to solve, but whose answers are easily verifiable.
So the first question is: what does your dataset represent?
The second question: what are you trying to solve and is a genetic algorithm a fitting method to solve your problem?
Anyway, creating a genetic algorithm is done through the following steps:
p(c)
and the mutation probability p(m)
So, you have to find a notation for your solution (such as an array of bits or a string) that allows you to swap parts of chromosomes easily. Then you have to identify the crossover and mutation operations. If you're dealing with ordered chromosomes, then depending on the applied crossover strategy you may have to repair your chromosomes afterwards. An ordered chromosome is a chromosome where the order or the genes matter. If you preform a standard crossover on two solution that represent the cities that the travelling salesman has to visit, you might end up with a chromosome where he visits some cities twice or more and some not at all!
There's no clear description on how to translate each problem in a genetic algorithm, because it's different for each problem. The above steps don't change, but you may need to introduce several different crossover and mutation operations to prevent premature convergence.
Upvotes: 1
Reputation: 4412
Well, I do not fully understand the description of the dataset, so my answer is based on the following assumptions: We have a set of attributes, say n different one. Each attributes have a set of different possible symbolic (=non numeric) values, say m(i) different possibilities. Each person have the same attributes, but some of them might be missing or None.
If these assumptions are correct and the set of attributes and possible values are not too high, then one of these might work:
if these two sets are really small you could have an n dimensional array as an individual/genotype. Each dimension would have the size m(i) and each value of this structure would be the yes/no answer. It would be the generalization (=more dimensions) of a fixed size (bit) vector. How to create random/mutate/crossover should be easy. Fitness would be how often it makes a good prediction.
if they are bigger then you will need something more complicated. One possibility is to have lists of rules. Each rule could be an vector with length n + a yes/no flag. In each position of the vector you would have a possible value of the related attribute. You could also have a jolly joker attribute accepting everything. Interpretation of a rule (p:person, r:rule) : if p1=r1 and p2=r2 and ... pn=rn then the result is the flag of the rule. You will have to evaluate rules until you find a matching one. You will also need a default. Genetic operators are a bit more tricky in this case, but I think you will find something if you search for variable length encoding. I've used a similar encoding (for a different problem) and it worked fine.
To be honest I'm not 100% sure if a genetic algorithm is the best choice for this problem, especially if the values are not symbolic, but numeric. It seems to be a pattern matching problem, and for that there are much better solutions. I would look for some alternatives, e.g. neural networks in the numeric case.
Upvotes: 0