Reputation: 28044
Let the following dfs:
import numpy as np
import pandas as pd
df1 = pd.DataFrame({
"k1": [1, 1, 2, 2, 3, 3, 4, 4, 4],
})
df2 = pd.DataFrame({
"k2": [1, 1, 2, 2, 3, 4, 4],
"v2": np.random.rand(7)
})
print(df1)
print("_______")
print(df2)
print("_______")
out:
k1
0 1
1 1
2 2
3 2
4 3
5 3
6 4
7 4
8 4
_______
k2 v2
0 1 0.260026
1 1 0.474951
2 2 0.695962
3 2 0.158575
4 3 0.396015
5 4 0.740344
6 4 0.293410
_______
I want to create a new column for df1
such that for every key k1
, a corresponding value will be applied such that if k1
== k2
, the value will be a function (say max) of v2
of the group in df2
whose key is k2
(k1
).
Required output for above case:
k1 result
0 1 0.474951
1 1 0.474951
2 2 0.695962
3 2 0.695962
4 3 0.396015
5 3 0.396015
6 4 0.740344
7 4 0.740344
8 4 0.740344
It can be assumed that all keys present in k1
are also in k2
.
This is probably done with two groupby operations, one for query and one for assignment, but I can't figure out how to tie together the output of one to the input of the other.
Edit:
Please notice the example k1
and k2
are sorted for clarity, but are not guaranteed to be. I also don't want to sort because of o(nlogn)
time, and this can be done in o(n)
Upvotes: 1
Views: 76
Reputation: 30050
First, you can sort on k2
and v2
columns in df2
to ensure that the bigger value in column v2
stay on first. Then drop duplicates on k2
to keep the first which is the max. At last, map
v2
column in k2
to df1
.
df1['result'] = df1['k1'].map(df2.sort_values(['k2', 'v2'], ascending=[True, False]).drop_duplicates('k2', keep='first').set_index('k2')['v2'])
print(df1)
k1 result
0 1 0.303764
1 1 0.303764
2 2 0.026024
3 2 0.026024
4 3 0.213834
5 3 0.213834
6 4 0.757031
7 4 0.757031
8 4 0.757031
Upvotes: 1
Reputation: 71707
We can try map
and groupby
df1['result'] = df1['k1'].map(df2.groupby('k2')['v2'].max())
k1 result
0 1 0.474951
1 1 0.474951
2 2 0.695962
3 2 0.695962
4 3 0.396015
5 3 0.396015
6 4 0.740344
7 4 0.740344
8 4 0.740344
Upvotes: 3