WhoIsKi
WhoIsKi

Reputation: 117

compare values of two column on two difference pandas data frame and return max

I have data frame df1:

import pandas as pd
data1 = {'id': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'C', 5: 'B'}, 'col1': {0: '7', 1: ' ', 2: '8', 3: '3', 4: '5', 5: '1'}}
df1 = pd.DataFrame(data1)

and df2 :

data2 = {'id': {0: 'A', 1: 'B', 2: 'C'}, 'testCol': {0: '0', 1: '4', 2: '1'}}
df2 = pd.DataFrame(data2)

by using pandas or numpy, How can compare df1['col1'] and df2['testCol'] for each id, and return max value in df2['testCol'] or in new column in df2?

result:

ID testCol
A 8
B 4
C 5

OR

ID testCol maxCol
A 0 8
B 4 4
C 1 5

-df1 and df2 are examples.

Upvotes: 2

Views: 54

Answers (3)

Cameron Riddell
Cameron Riddell

Reputation: 13407

You can do this in just a couple steps:

  • Rename "col1" to "testCol" to ensure everything aligns correctly
  • Vertically stack df1 and df2
  • Group by ID, and get the maximum of "testCol"
out = (
    df1.rename(columns={"col1": "testCol"})
    .append(df2)
    .groupby("id", as_index=False)
    ["testCol"].max()
)

print(out)
  id testCol
0  A       8
1  B       4
2  C       5

Upvotes: 2

Serg
Serg

Reputation: 129

Another way is this:

result = (
    df1.set_index('id')
    .merge(df2.set_index('id'), on='id')
    .max(axis=1)
)

which gives:

id
A    7.0
A    0.0
A    8.0
B    4.0
B    4.0
C    5.0
dtype: float64

then you can groupby id and get the overall max:

result = (
    df1.set_index('id')
    .merge(df2.set_index('id'), on='id')
    .max(axis=1)
    .groupby('id')
    .max()
)

output:

id
A    8.0
B    4.0
C    5.0
dtype: float64

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195438

Try:

x = (
    pd.concat(
        [df1.groupby("id")["col1"].max(), df2.set_index("id")["testCol"]],
        axis=1,
    )
    .max(axis=1)
    .astype(int)
    .reset_index(name="testCol")
)
print(x)

Prints:

  id  testCol
0  A        8
1  B        4
2  C        5

Upvotes: 2

Related Questions