Reputation: 669
I have two dataframe and want to merge it based on max of another column
df1:
C2
A
B
C
df2:
C1 C2 val
X A 100
Y A 50.5
Z A 60
E B 90
F B 45
G C 100
I tried,
df3 = df1.merge(df2, on='C2', how='inner')['val'].max()
I get the error, AttributeError: 'numpy.float64' object has no attribute 'head'
The val column has only numbers. How should I modify this and Why do I encounter this error ?
The expected output is:
df3:
C2 C1 val
A X 100
B E 90
C G 100
Thanks in advance.
Upvotes: 1
Views: 3523
Reputation: 863291
I think you need merge
by left join:
df3 = df2.merge(df1, on='C2', how='left')
And then groupby
with idxmax
for indices of max values per groups and select rows by loc
:
df3 = df3.loc[df3.groupby('C2')['val'].idxmax()]
Or use sort_values
with drop_duplicates
:
df3 = df3.sort_values(['C2', 'val']).drop_duplicates('C2', keep='last')
print (df3)
C1 C2 val
0 X A 100.0
3 E B 90.0
5 G C 100.0
Why do I encounter this error ?
Problem is you get scalar - max value of column val
:
df3 = df1.merge(df2, on='C2', how='inner')['val'].max()
print (df3)
100.0
So if use print (df3.head())
it failed.
Upvotes: 2