JungleDiff
JungleDiff

Reputation: 3485

How to sort dataframe based on a column in another dataframe in Pandas?

enter image description here

Say I have two dataframes, df1 and df2 in the picture above. I want to sort the df1 based on the Col2 of df2.

So the end result of the df1 should look like the third dataframe in the picture, where the Col2 is in the same values, order in df2.

Upvotes: 29

Views: 35030

Answers (2)

ah bon
ah bon

Reputation: 10011

Please try this:

import pandas as pd

# Assume we have the following two dataframes:
df1 = pd.DataFrame({'Col1': ['a','b','c','d','e', 'f'], 
                    'Col2': ['chocolate','chicken','pizza','icecream','cake', 'pizza'],
                    'Col3': [1, 2, 3, 4, 5, 6],
                    'Col4': ['x', 'y', 'z', 'xx', 'yy', 'zz']})
df2 = pd.DataFrame({'Col1': ['f','g','h','i','j'], 
                    'Col2': ['chicken','cake','icecream','chocolate','pizza']})

# Create a dictionary from df2 with 'Col2' as keys and 'Col1' as values
order_dict = df2.set_index('Col2')['Col1'].to_dict()

# Create a new column in df1 that maps the values in 'Col2' to the values in the dictionary
df1['sort_column'] = df1['Col2'].map(order_dict)

# Sort df1 based on the new column, then drop the new column
df1 = df1.sort_values('sort_column').drop(columns='sort_column')

print(df1)

Out:

  Col1       Col2  Col3 Col4
1    b    chicken     2    y
4    e       cake     5   yy
3    d   icecream     4   xx
0    a  chocolate     1    x
2    c      pizza     3    z
5    f      pizza     6   zz

Upvotes: 0

Spandan Brahmbhatt
Spandan Brahmbhatt

Reputation: 4044

You can use combination of set_index and reindex for this.

Try this code :

df1 = pd.DataFrame({'Col1': ['a','b','c','d','e'], 'Col2': 
['chocolate','chicken','pizza','icecream','cake'] })
Out :
  Col1       Col2
0    a  chocolate
1    b    chicken
2    c      pizza
3    d   icecream
4    e       cake
df2 = pd.DataFrame({'Col1': ['f','g','h','i','j'], 'Col2': ['chicken','cake','icecream','chocolate','pizza'] })
Out :
  Col1       Col2
0    f    chicken
1    g       cake
2    h   icecream
3    i  chocolate
4    j      pizza
df1 = df1.set_index('Col2')
df1 = df1.reindex(index=df2['Col2'])
df1 = df1.reset_index()
Out :
        Col2 Col1
0    chicken    b
1       cake    e
2   icecream    d
3  chocolate    a
4      pizza    c

Upvotes: 48

Related Questions