Olli
Olli

Reputation: 1126

How to map values to a DataFrame with multiple columns as keys?

I have two dataframes like so:

    data = {'A': [3, 2, 1, 0], 'B': [1, 2, 3, 4]}
    data2 = {'A': [3, 2, 1, 0, 3, 2], 'B': [1, 2, 3, 4, 20, 2], 'C':[5,3,2,1, 5, 1]}
    df1 = pd.DataFrame.from_dict(data)
    df2 = pd.DataFrame.from_dict(data2)

Now I did a groupby of df2 for C

values_to_map = df2.groupby(['A', 'B']).mean().to_dict()

Now I would like to map df1['new C'] where the columns A and B match.

A   B   new_C
0   3   1   1.0
1   2   2   2.0
2   1   3   2.0
3   0   4   12.5

where new c is basically the averages of C for every pair A, B from df2

Note that A and B don't have to be keys of the dataframe (i.e. they aren't unique identifiers which is why I want to map it with a dictionary originally, but failed with multiple keys)

How would I go about that?

Thank you for looking into it with me!

Upvotes: 4

Views: 3115

Answers (3)

Olli
Olli

Reputation: 1126

I found a solution to this

values_to_map = df2.groupby(['A', 'B']).mean().to_dict()

df1['new_c'] = df1.apply(lambda x: values_to_map[x['A'], x['B']], axis=1)

Thanks for looking into it!

Upvotes: 2

Mustafa Aydın
Mustafa Aydın

Reputation: 18315

You can first form a MultiIndex from the [["A", "B"]] subset of the frame df1 and use its map function to map the A-B pairs to the desired grouped mean values:

cols = ["A", "B"]

mapper = df2.groupby(cols).C.mean()

df1["new_c"] = pd.MultiIndex.from_frame(df1[cols]).map(mapper)

to get

>>> df1

   A  B  new_c
0  3  1    5.0
1  2  2    2.0
2  1  3    2.0
3  0  4    1.0

(if an A-B pair in df1 isn't found in df2's groups, new_c corresponding to that pair will be NaN with this method.)


Note that neither pandas' apply nor np.vectorize are "vectorized" routines. However, they might be fast enough for one's purposes and might prove more readable in places.

Upvotes: 1

U13-Forward
U13-Forward

Reputation: 71610

Just do np.vectorize:

values_to_map = df2.groupby(['A', 'B']).mean().to_dict()

df1['new_c'] = np.vectorize(lambda x: values_to_map.get(x['A'], x['B']))(df1[['A', 'B']])

Upvotes: 1

Related Questions