Reputation: 333
I have 2 dataframes like below :
df1 = pd.DataFrame(
{
'sentence': ['text1', 'text2', 'text3', 'text1', 'text1', 'text2'],
'label': ['abc', 'abc', 'abc', 'def', 'ghi', 'ghi']
}
)
df2 = pd.DataFrame(
{
'sentence': ['html_text1', 'html_text2', 'html_text3', 'html_text4'],
'label': ['abc', 'abc', 'def', 'ghi']
}
)
I want to iterate over the 2 dataframes and create a new dataframe. The condition for creating new dataframe is :
When label of df2 matches with label of df1, that record of df2 should be inserted above matching record of df1. So the final dataframe should look like :
P.S: I have not been able to work out the logic yet so I am not able to put sample code. However, I am trying to use dataframe.iterrows() to work on the above case.
Upvotes: 0
Views: 37
Reputation: 260600
concat
and sort_values
with a stable sort:
out = (pd.concat([df2, df1])
.sort_values('label', kind='stable', ignore_index=True)
[['label', 'sentence']]
)
output:
label sentence
0 abc html_text1
1 abc html_text2
2 abc text1
3 abc text2
4 abc text3
5 def html_text3
6 def text1
7 ghi html_text4
8 ghi text1
9 ghi text2
Ensuring df2 contains labels from df1:
out = (pd.concat([df2[df2['label'].isin(df1['label'])], df1])
.sort_values('label', kind='stable', ignore_index=True)
[['label', 'sentence']]
)
Upvotes: 2