Reputation: 1126
I have two dataframes like so:
data = {'A': [3, 2, 1, 0], 'B': [1, 2, 3, 4]}
data2 = {'A': [3, 2, 1, 0, 3, 2], 'B': [1, 2, 3, 4, 20, 2], 'C':[5,3,2,1, 5, 1]}
df1 = pd.DataFrame.from_dict(data)
df2 = pd.DataFrame.from_dict(data2)
Now I did a groupby of df2 for C
values_to_map = df2.groupby(['A', 'B']).mean().to_dict()
Now I would like to map df1['new C']
where the columns A and B match.
A B new_C
0 3 1 1.0
1 2 2 2.0
2 1 3 2.0
3 0 4 12.5
where new c is basically the averages of C
for every pair A, B
from df2
Note that A and B don't have to be keys of the dataframe (i.e. they aren't unique identifiers which is why I want to map it with a dictionary originally, but failed with multiple keys)
How would I go about that?
Thank you for looking into it with me!
Upvotes: 4
Views: 3115
Reputation: 1126
I found a solution to this
values_to_map = df2.groupby(['A', 'B']).mean().to_dict()
df1['new_c'] = df1.apply(lambda x: values_to_map[x['A'], x['B']], axis=1)
Thanks for looking into it!
Upvotes: 2
Reputation: 18315
You can first form a MultiIndex from the [["A", "B"]]
subset of the frame df1
and use its map
function to map the A-B pairs to the desired grouped mean values:
cols = ["A", "B"]
mapper = df2.groupby(cols).C.mean()
df1["new_c"] = pd.MultiIndex.from_frame(df1[cols]).map(mapper)
to get
>>> df1
A B new_c
0 3 1 5.0
1 2 2 2.0
2 1 3 2.0
3 0 4 1.0
(if an A-B pair in df1
isn't found in df2
's groups, new_c
corresponding to that pair will be NaN
with this method.)
Note that neither pandas' apply
nor np.vectorize
are "vectorized" routines. However, they might be fast enough for one's purposes and might prove more readable in places.
Upvotes: 1
Reputation: 71610
Just do np.vectorize
:
values_to_map = df2.groupby(['A', 'B']).mean().to_dict()
df1['new_c'] = np.vectorize(lambda x: values_to_map.get(x['A'], x['B']))(df1[['A', 'B']])
Upvotes: 1