How to assign a value to each group by a groupby query on another dataframe?

Question

Let the following dfs:

import numpy as np
import pandas as pd

df1 = pd.DataFrame({
    "k1": [1, 1, 2, 2, 3, 3, 4, 4, 4],
})

df2 = pd.DataFrame({
    "k2": [1, 1, 2, 2, 3, 4, 4],
    "v2": np.random.rand(7)
})

print(df1)
print("_______")
print(df2)
print("_______")

out:

   k1
0   1
1   1
2   2
3   2
4   3
5   3
6   4
7   4
8   4
_______
   k2        v2
0   1  0.260026
1   1  0.474951
2   2  0.695962
3   2  0.158575
4   3  0.396015
5   4  0.740344
6   4  0.293410
_______

I want to create a new column for df1 such that for every key k1, a corresponding value will be applied such that if k1 == k2, the value will be a function (say max) of v2 of the group in df2 whose key is k2 (k1).

Required output for above case:

   k1  result
0   1  0.474951
1   1  0.474951
2   2  0.695962
3   2  0.695962
4   3  0.396015
5   3  0.396015
6   4  0.740344
7   4  0.740344
8   4  0.740344

It can be assumed that all keys present in k1 are also in k2.

This is probably done with two groupby operations, one for query and one for assignment, but I can't figure out how to tie together the output of one to the input of the other.

Edit:
Please notice the example k1 and k2 are sorted for clarity, but are not guaranteed to be. I also don't want to sort because of o(nlogn) time, and this can be done in o(n)

Shubham Sharma · Accepted Answer

We can try map and groupby

df1['result'] = df1['k1'].map(df2.groupby('k2')['v2'].max())

   k1    result
0   1  0.474951
1   1  0.474951
2   2  0.695962
3   2  0.695962
4   3  0.396015
5   3  0.396015
6   4  0.740344
7   4  0.740344
8   4  0.740344

How to assign a value to each group by a groupby query on another dataframe?

Answers (2)

Related Questions