Crusader
Crusader

Reputation: 333

Insert value from one dataframe to another based on search

I have 2 dataframes like below :

df1 = pd.DataFrame(
{
    'sentence': ['text1', 'text2', 'text3', 'text1', 'text1', 'text2'],
    'label': ['abc', 'abc', 'abc', 'def', 'ghi', 'ghi']
}
)

df2 = pd.DataFrame(
{
    'sentence': ['html_text1', 'html_text2', 'html_text3', 'html_text4'],
    'label': ['abc', 'abc', 'def', 'ghi']
}
)

I want to iterate over the 2 dataframes and create a new dataframe. The condition for creating new dataframe is :

When label of df2 matches with label of df1, that record of df2 should be inserted above matching record of df1. So the final dataframe should look like :

enter image description here

P.S: I have not been able to work out the logic yet so I am not able to put sample code. However, I am trying to use dataframe.iterrows() to work on the above case.

Upvotes: 0

Views: 37

Answers (1)

mozway
mozway

Reputation: 260600

concat and sort_values with a stable sort:

out = (pd.concat([df2, df1])
         .sort_values('label', kind='stable', ignore_index=True)
         [['label', 'sentence']]
      )

output:

  label    sentence
0   abc  html_text1
1   abc  html_text2
2   abc       text1
3   abc       text2
4   abc       text3
5   def  html_text3
6   def       text1
7   ghi  html_text4
8   ghi       text1
9   ghi       text2

Ensuring df2 contains labels from df1:

out = (pd.concat([df2[df2['label'].isin(df1['label'])], df1])
         .sort_values('label', kind='stable', ignore_index=True)
         [['label', 'sentence']]
      )

Upvotes: 2

Related Questions