Reputation: 276
I am performing feature selection ( on a dataset with 1,00,000 rows and 32 features) using multinomial Logistic Regression using python.Now, what would be the most efficient way to select features in order to build model for multiclass target variable(1,2,3,4,5,6,7)?
Upvotes: 0
Views: 2488
Reputation: 6754
Of course there are several methods to choose your features. But sometimes the next simple approach can help you. You can assess the contribution of your features (by potential prediction of the result variable) with help of linear models. Note that it mainly works for the situations where you suspect linear dependence between your features and the answer.
import statsmodels.formula.api as smf
# Lottery here is Y, the fields from X are right of ~
mod = smf.ols(formula='Lottery ~ Literacy + Wealth + Region', data=df)
res = mod.fit()
print(res.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Lottery R-squared: 0.338
Model: OLS Adj. R-squared: 0.287
Method: Least Squares F-statistic: 6.636
Date: Tue, 28 Feb 2017 Prob (F-statistic): 1.07e-05
Time: 21:36:08 Log-Likelihood: -375.30
No. Observations: 85 AIC: 764.6
Df Residuals: 78 BIC: 781.7
Df Model: 6
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------
Intercept 38.6517 9.456 4.087 0.000 19.826 57.478
Region[T.E] -15.4278 9.727 -1.586 0.117 -34.793 3.938
Region[T.N] -10.0170 9.260 -1.082 0.283 -28.453 8.419
Region[T.S] -4.5483 7.279 -0.625 0.534 -19.039 9.943
Region[T.W] -10.0913 7.196 -1.402 0.165 -24.418 4.235
Literacy -0.1858 0.210 -0.886 0.378 -0.603 0.232
Wealth 0.4515 0.103 4.390 0.000 0.247 0.656
==============================================================================
Omnibus: 3.049 Durbin-Watson: 1.785
Prob(Omnibus): 0.218 Jarque-Bera (JB): 2.694
Skew: -0.340 Prob(JB): 0.260
Kurtosis: 2.454 Cond. No. 371.
==============================================================================
The more R-squared value, the better your chosen combination of features can predict the response in linear model. If they can predict in linear models then, I think, they have even bigger potential with more complex models such as decision trees.
Please view the next page for more details (please note that some additional data handling may be required if errors of your data are heteroskedasticity to get a right result): http://www.statsmodels.org/dev/example_formulas.html
And of course I recommend you build pair plot for your features too.
The methods is not very deep, they referrers to correlations and what you see, but sometimes (in not difficult situations) are pragmatic.
Upvotes: 2