user76170
user76170

Reputation: 193

Genetic algorithms: fitness function not working properly

I have a binary dataset of (m x n) m instances and n features with m >> n. And there is a target Variable or Class attribute, also binary. I want to do feature selection using genetic algorithm. I decided to use 0/ 1 strings in the GA, where 0 if a feature s not selected, and 1 if a feature is selected. I generated a random K sets of bit strings. Thus each K of these bit strings represents a possible selection of features. To develop a fitness function , I train a neural network with each of these K feature sets(models), and then based on the accuracy on a separate Validation set I created this fitness function for each model :-

fitness=tradeoffk*Valacc+(1-tradeoffk)*(ones(no_of_models,1)*n-featSel)/maxFeat;

This fitness function is like a tradeoff between the number of features passed for training (featSel) and the validation accuracy reported the neural network. I set different values to tradeoffk like 0.5, 0.2 and 0.8.

I ran 10 iterations of the GA. Each iteration was done for 20 genertations, and tried to check how the fitness function grows. However, there is no significant change in the fitness function. In a GA, generally the fitness function is expected to grow and then stabilizes but here it grows very marginally.

For instance, this is the sample output of one of these iterations :-

gen=001  avgFitness=0.808   maxFitness=0.918
gen=002  avgFitness=0.808   maxFitness=0.918
gen=003  avgFitness=0.815   maxFitness=0.918
gen=004  avgFitness=0.815   maxFitness=0.918
gen=005  avgFitness=0.817   maxFitness=0.918
gen=006  avgFitness=0.818   maxFitness=0.918
gen=007  avgFitness=0.818   maxFitness=0.918
gen=008  avgFitness=0.819   maxFitness=0.918
gen=009  avgFitness=0.819   maxFitness=0.918
gen=010  avgFitness=0.819   maxFitness=0.918
gen=011  avgFitness=0.819   maxFitness=0.918
gen=012  avgFitness=0.819   maxFitness=0.918
gen=013  avgFitness=0.819   maxFitness=0.918
gen=014  avgFitness=0.819   maxFitness=0.918
gen=015  avgFitness=0.819   maxFitness=0.918
gen=016  avgFitness=0.819   maxFitness=0.918
gen=017  avgFitness=0.819   maxFitness=0.918

Also ,the neural network takes a lot of time to train ( > 2 hours for 20 generations) Could anyone give further suggestions , and where is it possibly going wrong ?!

Upvotes: 1

Views: 1201

Answers (1)

Andreas
Andreas

Reputation: 6475

You could use linear-discriminant analysis (LDA) for your validation model instead of neural network. It is much quicker to train, but of course cannot represent non-linear relationships. Have you tried genetic programming? It does have feature-selection built-in as it tries to build a model and select features at the same time. You could give HeuristicLab a try which has a quite powerful genetic programming implementation that also includes classification.

Upvotes: 0

Related Questions