Arun
Arun

Reputation: 669

pandas merge with condition

I have two dataframe and want to merge it based on max of another column

df1:
C2
A
B
C

df2:
C1 C2 val
X  A  100
Y  A  50.5
Z  A  60
E  B  90
F  B  45
G  C  100

I tried,

df3 = df1.merge(df2, on='C2', how='inner')['val'].max()

I get the error, AttributeError: 'numpy.float64' object has no attribute 'head'

The val column has only numbers. How should I modify this and Why do I encounter this error ?

The expected output is:

df3:
C2 C1 val
A  X  100
B  E  90
C  G  100

Thanks in advance.

Upvotes: 1

Views: 3523

Answers (1)

jezrael
jezrael

Reputation: 863291

I think you need merge by left join:

df3 = df2.merge(df1, on='C2', how='left')

And then groupby with idxmax for indices of max values per groups and select rows by loc:

df3 = df3.loc[df3.groupby('C2')['val'].idxmax()]

Or use sort_values with drop_duplicates:

df3 = df3.sort_values(['C2', 'val']).drop_duplicates('C2', keep='last')
print (df3)
  C1 C2    val
0  X  A  100.0
3  E  B   90.0
5  G  C  100.0

Why do I encounter this error ?

Problem is you get scalar - max value of column val:

df3 = df1.merge(df2, on='C2', how='inner')['val'].max()
print (df3)
100.0

So if use print (df3.head()) it failed.

Upvotes: 2

Related Questions