Yonela Nuba
Yonela Nuba

Reputation: 143

Automatic feature selection - Sklearn.feature_selection

I have two datasets a train and test data. train.shape = (307511, 122) and test.shape = (48744, 121). both these data sets contain these dtype: int32, float64 and object.

I did hot encoding to convert objects to either float or int dtype.

train = pd.get_dummies(train)
test = pd.get_dummies(test)
print('Train dummies shape: {}'.format(train.shape))
print('Test dummies shape: {}'.format(test.shape))

I got these results from the code above:

Train dummies shape: (307511, 246)
Test dummies shape: (48744, 242)

The shape has changed thus HotEncoding has succeeded. But now the problem I am facing is that When I try to train and test my data i get this error:

ValueError: Input contains NaN, infinity or a value too large for dtype('float32')

These are my imports:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectFromModel 
from sklearn.ensemble import ExtraTreesClassifier

Please help

Upvotes: 1

Views: 590

Answers (1)

fuwiak
fuwiak

Reputation: 741

Try this:

train.as_matrix().astype(np.float)
test.as_matrix().astype(np.float)

Upvotes: 2

Related Questions