Sadegh
Sadegh

Reputation: 131

Feature selection for Logistic Regression

Both Kaplan Meier method and Logistic Regression have their own feature selections. I want to use another method to pick best features for example, back stepwise feature selection. Is it possible to use this sort of methods instead or not.

My data acquires more than 130 features and about 3000 individuals. Since it is medical [cancer] data I don't want to use simple methods.

Further information about the project can be seen here and it is in order of what should I do:

  1. preprocessing the data
  2. separating them for test and train
  3. Data imputation for train data
  4. Feature selection by train data
  5. Training the models which are Kaplan Meier and Logistic Regression
  6. Testing the model

Pleas inform me that is it wrong to use any other feature selection for them or not? I can use any tip about the model which I have listed too.

Upvotes: 0

Views: 475

Answers (1)

spectre
spectre

Reputation: 767

Basically there are 4 types of feature selection (fs) techniques namely:-

1.) Filter based fs 2.) Wrapper based fs 3.) Embedded fs techniques 4.) Hybrid fs techniques

Each has it's own advantages and disadvantages. For ex, filter fs is used when you want to determine if "one" feature is important to the output variable. So if you have 400 features in your dataset, you would have to repeat this 400 times!

Wrapper based methods (as you mentioned in you question), on the other hand do this is one step. But they are prone to overfitting, whereas filter based methods are not.

Embedded methods use tree based methods for fs purpose.

I do not have enough knowledge about hybrid methods.

I would say you could use some wrapper based techniques like RFECV since you say you do not want to use simple filter techniques.

Upvotes: 1

Related Questions