Reputation: 58
I have 2 dataframes that are identical except for one column. I am hoping to merge the two together and conditionally accept the value of the column. In this case I am look for the max of the two, but in general any conditional would be ideal.
import pandas as pd
df1 = pd.DataFrame([['Tom', 30], ['Jane', 40], ['Barry', 22], ['Kelly', 15]])
df2 = pd.DataFrame([['Tom', 10], ['Jane', 50], ['Barry', 22]])
df1:
0 1
0 Tom 30
1 Jane 40
2 Barry 22
3 Kelly 15
df2
0 1
0 Tom 10
1 Jane 50
2 Barry 22
I am looking to end up with a data frame that merges the two and takes the max of column 1.
Example:
0 1
0 Tom 30
1 Jane 50
2 Barry 22
3 Kelly 15
Upvotes: 0
Views: 52
Reputation: 26676
Another way; append, sort_values and drop_duplicates. Code below
df2.append(df1).sort_values(by=['0',"1"],ascending = (False, True)).drop_duplicates(subset=['0'],keep='last')
0 1
0 Tom 30
3 Kelly 15
1 Jane 50
2 Barry 22
Upvotes: 1
Reputation: 28649
Merge the data, setting how
as outer
, before grouping to get the max
:
df1.merge(df2, how='outer').groupby(0, as_index = False, sort=False).max()
0 1
0 Tom 30
1 Jane 50
2 Barry 22
3 Kelly 15
Upvotes: 1