Reputation: 2350
Here is my Code for feature selection method in Python:
from sklearn.svm import LinearSVC
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris.data, iris.target
X.shape
(150, 4)
X_new = LinearSVC(C=0.01, penalty="l1", dual=False).fit_transform(X, y)
X_new.shape
(150, 3)
But after getting new X(dependent variable - X_new), How do i know which variables are removed and which variables are considered in this new updated variable ? (which one removed or which three are present in data.)
Reason of getting this identification is to apply the same filtering on new test data.
Upvotes: 2
Views: 290
Reputation: 3707
Modified your code a little bit. For each class, the features used can be seen by looking at the the coefficients of LinearSVC. According to the documentation, coef_ : array, shape = [n_features] if n_classes == 2 else [n_classes, n_features]
As for new data, you just need to apply transform to it.
from sklearn.svm import LinearSVC
from sklearn.datasets import load_iris
import numpy as np
iris = load_iris()
X, y = iris.data, iris.target
print X.shape
lsvc = LinearSVC(C=0.01, penalty="l1", dual=False)
X_new = lsvc.fit_transform(X, y)
print X_new.shape
print lsvc.coef_
newData = np.random.rand(100,4)
newData_X = lsvc.transform(newData)
print newData_X.shape
Upvotes: 1