김정석
김정석

Reputation: 117

why sample's size is difference when concatenate two dataframe?

I transformed separately training set and test set to get dummies for categorical features with pandas.get_dummies().
So the dimension difference occurred because of categorical differences in the training set and the test set.
I tried to equalize the dimension.
But the problem below occurred.
Why is the sample size different when concatenating two dataframes?

enter image description here

Upvotes: 1

Views: 102

Answers (1)

jezrael
jezrael

Reputation: 863166

In my opinion there is not default RangeIndex in X_train.index, so need create it before concat:

X_train = X_train.reset_index(drop=True)

Another solution is add parameter index for same indices in both DataFrames:

diff_df2 = pd.Dataframe(np.zeros((X_train.shape[0], len(diff_dummy2))), 
                        columns=diff_dummy2,
                        index= X_train.index)

Upvotes: 1

Related Questions