Reputation: 117
I transformed separately training set and test set to get dummies for categorical features with pandas.get_dummies()
.
So the dimension difference occurred because of categorical differences in the training set and the test set.
I tried to equalize the dimension.
But the problem below occurred.
Why is the sample size different when concatenating two dataframes?
Upvotes: 1
Views: 102
Reputation: 863166
In my opinion there is not default RangeIndex
in X_train.index
, so need create it before concat
:
X_train = X_train.reset_index(drop=True)
Another solution is add parameter index
for same indices in both DataFrame
s:
diff_df2 = pd.Dataframe(np.zeros((X_train.shape[0], len(diff_dummy2))),
columns=diff_dummy2,
index= X_train.index)
Upvotes: 1