Reputation: 143
I have two datasets a train and test data. train.shape = (307511, 122) and test.shape = (48744, 121). both these data sets contain these dtype: int32, float64 and object.
I did hot encoding to convert objects to either float or int dtype.
train = pd.get_dummies(train)
test = pd.get_dummies(test)
print('Train dummies shape: {}'.format(train.shape))
print('Test dummies shape: {}'.format(test.shape))
I got these results from the code above:
Train dummies shape: (307511, 246)
Test dummies shape: (48744, 242)
The shape has changed thus HotEncoding has succeeded. But now the problem I am facing is that When I try to train and test my data i get this error:
ValueError: Input contains NaN, infinity or a value too large for dtype('float32')
These are my imports:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectFromModel
from sklearn.ensemble import ExtraTreesClassifier
Please help
Upvotes: 1
Views: 590
Reputation: 741
Try this:
train.as_matrix().astype(np.float)
test.as_matrix().astype(np.float)
Upvotes: 2