Zein
Zein

Reputation: 43

Reordering rows in a dataframe to match order of rows in another dataframe

I have 2 dataframes df1 and df2 which have the same number of rows but in a different ordering. The dataframes have an ID column which contains a unique identifier for each row, both dataframes have this ID column and this is the column for which I would like the order of the rows in df2 to match in df1. The reason I want the order to be the same is that the dataframes will be passed to sklearn train_test_split and I wish to split the dataframes in the exact same order. How can I do this?

Here is the first dataframe:

df1

Paper ID          Document       Label 
math.123.txt    df1 content 1      0
math.321.txt    df1 content 2      0
math.213.txt    df1 content 3      0

df2

Paper ID         Document        Label 
math.123.txt     df2 content 1         0
math.213.txt     df2 content 2         0
math.321.txt     df2 content 3         0

desired order of df2:

df2

Paper ID         Document            Label 
math.123.txt     df2 content 1         0
math.321.txt     df2 content 3         0
math.213.txt     df2 content 2         0

So essentially I just want to reorder the rows in df2 based on the order of rows in df1["Paper ID"]

Upvotes: 3

Views: 3400

Answers (2)

mozway
mozway

Reputation: 260600

You can use reindex using 'Paper ID' as index, here is a pipeline version:

df2 = (df2.set_index('Paper ID')
          .reindex(df1.set_index('Paper ID').index)
          .reset_index()
       )

Or, if your values are unique, you can try to perform a left join on a df1 with only the key column (untested):

pd.merge(df1[['Paper ID']], df2,
         how='left', on='Paper ID',
         sort=False
         )

Upvotes: 3

afsharov
afsharov

Reputation: 5164

Since you want to order the dataframes according to the Paper ID, you should first set them as the index in both dataframes:

df1.set_index('Paper ID', inplace=True)
df2.set_index('Paper ID', inplace=True)

Now you can reindex df2 to match the order of df1:

df2 = df2.reindex(df1.index)

Finally, reset the indices to restore the default index:

df1.reset_index(inplace=True)
df2.reset_index(inplace=True)

Upvotes: 9

Related Questions