Reputation: 193
I have a binary dataset of (m x n) m instances and n features with m >> n. And there is a target Variable or Class attribute, also binary. I want to do feature selection using genetic algorithm. I decided to use 0/ 1 strings in the GA, where 0 if a feature s not selected, and 1 if a feature is selected. I generated a random K sets of bit strings. Thus each K of these bit strings represents a possible selection of features. To develop a fitness function , I train a neural network with each of these K feature sets(models), and then based on the accuracy on a separate Validation set I created this fitness function for each model :-
fitness=tradeoffk*Valacc+(1-tradeoffk)*(ones(no_of_models,1)*n-featSel)/maxFeat;
This fitness function is like a tradeoff between the number of features passed for training (featSel) and the validation accuracy reported the neural network. I set different values to tradeoffk like 0.5, 0.2 and 0.8.
I ran 10 iterations of the GA. Each iteration was done for 20 genertations, and tried to check how the fitness function grows. However, there is no significant change in the fitness function. In a GA, generally the fitness function is expected to grow and then stabilizes but here it grows very marginally.
For instance, this is the sample output of one of these iterations :-
gen=001 avgFitness=0.808 maxFitness=0.918
gen=002 avgFitness=0.808 maxFitness=0.918
gen=003 avgFitness=0.815 maxFitness=0.918
gen=004 avgFitness=0.815 maxFitness=0.918
gen=005 avgFitness=0.817 maxFitness=0.918
gen=006 avgFitness=0.818 maxFitness=0.918
gen=007 avgFitness=0.818 maxFitness=0.918
gen=008 avgFitness=0.819 maxFitness=0.918
gen=009 avgFitness=0.819 maxFitness=0.918
gen=010 avgFitness=0.819 maxFitness=0.918
gen=011 avgFitness=0.819 maxFitness=0.918
gen=012 avgFitness=0.819 maxFitness=0.918
gen=013 avgFitness=0.819 maxFitness=0.918
gen=014 avgFitness=0.819 maxFitness=0.918
gen=015 avgFitness=0.819 maxFitness=0.918
gen=016 avgFitness=0.819 maxFitness=0.918
gen=017 avgFitness=0.819 maxFitness=0.918
Also ,the neural network takes a lot of time to train ( > 2 hours for 20 generations) Could anyone give further suggestions , and where is it possibly going wrong ?!
Upvotes: 1
Views: 1201
Reputation: 6475
You could use linear-discriminant analysis (LDA) for your validation model instead of neural network. It is much quicker to train, but of course cannot represent non-linear relationships. Have you tried genetic programming? It does have feature-selection built-in as it tries to build a model and select features at the same time. You could give HeuristicLab a try which has a quite powerful genetic programming implementation that also includes classification.
Upvotes: 0