Reputation: 117
I have data frame df1:
import pandas as pd
data1 = {'id': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'C', 5: 'B'}, 'col1': {0: '7', 1: ' ', 2: '8', 3: '3', 4: '5', 5: '1'}}
df1 = pd.DataFrame(data1)
and df2 :
data2 = {'id': {0: 'A', 1: 'B', 2: 'C'}, 'testCol': {0: '0', 1: '4', 2: '1'}}
df2 = pd.DataFrame(data2)
by using pandas or numpy, How can compare df1['col1'] and df2['testCol'] for each id, and return max value in df2['testCol'] or in new column in df2?
result:
ID | testCol |
---|---|
A | 8 |
B | 4 |
C | 5 |
OR
ID | testCol | maxCol |
---|---|---|
A | 0 | 8 |
B | 4 | 4 |
C | 1 | 5 |
-df1 and df2 are examples.
Upvotes: 2
Views: 54
Reputation: 13407
You can do this in just a couple steps:
out = (
df1.rename(columns={"col1": "testCol"})
.append(df2)
.groupby("id", as_index=False)
["testCol"].max()
)
print(out)
id testCol
0 A 8
1 B 4
2 C 5
Upvotes: 2
Reputation: 129
Another way is this:
result = (
df1.set_index('id')
.merge(df2.set_index('id'), on='id')
.max(axis=1)
)
which gives:
id
A 7.0
A 0.0
A 8.0
B 4.0
B 4.0
C 5.0
dtype: float64
then you can groupby id
and get the overall max:
result = (
df1.set_index('id')
.merge(df2.set_index('id'), on='id')
.max(axis=1)
.groupby('id')
.max()
)
output:
id
A 8.0
B 4.0
C 5.0
dtype: float64
Upvotes: 1
Reputation: 195438
Try:
x = (
pd.concat(
[df1.groupby("id")["col1"].max(), df2.set_index("id")["testCol"]],
axis=1,
)
.max(axis=1)
.astype(int)
.reset_index(name="testCol")
)
print(x)
Prints:
id testCol
0 A 8
1 B 4
2 C 5
Upvotes: 2