why sample's size is difference when concatenate two dataframe?

Question

I transformed separately training set and test set to get dummies for categorical features with pandas.get_dummies().
So the dimension difference occurred because of categorical differences in the training set and the test set.
I tried to equalize the dimension.
But the problem below occurred.
Why is the sample size different when concatenating two dataframes?

jezrael · Accepted Answer

In my opinion there is not default RangeIndex in X_train.index, so need create it before concat:

X_train = X_train.reset_index(drop=True)

Another solution is add parameter index for same indices in both DataFrames:

diff_df2 = pd.Dataframe(np.zeros((X_train.shape[0], len(diff_dummy2))), 
                        columns=diff_dummy2,
                        index= X_train.index)

why sample's size is difference when concatenate two dataframe?

Answers (1)

Related Questions

why sample&#39;s size is difference when concatenate two dataframe?

Answers (1)

Related Questions

why sample's size is difference when concatenate two dataframe?