Reputation: 89
If I have 2 dataframes like these two:
import pandas as pd
df1 = pd.DataFrame({'Type':list('AABAC')})
df2 = pd.DataFrame({'Type':list('ABCDEF'), 'Value':[1,2,3,4,5,6]})
Type
0 A
1 A
2 B
3 A
4 C
Type Value
0 A 1
1 B 2
2 C 3
3 D 4
4 E 5
5 F 6
I would like to add a column in df1 based on the values in df2. df2 only contains unique values, whereas df1 has multiple entries of each value. So the resulting df1 should look like this:
Type Value
0 A 1
1 A 1
2 B 2
3 A 1
4 C 3
My actual dataframe df1 is quite long, so I need something that is efficient (I tried it in a loop but this takes forever).
Upvotes: 3
Views: 4063
Reputation: 2320
Another way to do this is by using the label based indexer loc
. First use the Type
column as the index using .set_index
, then access using the df1
column, and reset the index to the original with .reset_index
:
df2.set_index('Type').loc[df1['Type'],:].reset_index()
Either use this as your new df1
or extract the Value
column:
df1['Value'] = df2.set_index('Type').loc[df1['Type'],:].reset_index()['Value']
Upvotes: 0
Reputation: 393893
As requested I am posting a solution that uses map
without the need to create a temporary dict:
In[3]:
df1['Value'] = df1['Type'].map(df2.set_index('Type')['Value'])
df1
Out[3]:
Type Value
0 A 1
1 A 1
2 B 2
3 A 1
4 C 3
This relies on a couple things, that the key values that are being looked up exist otherwise we get a KeyError
and that we don't have duplicate entries in df2
otherwise setting the index raises InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Upvotes: 4
Reputation: 31662
You could create dict
from your df2
with to_dict
method and then map
result to Type
column for df1
:
replace_dict = dict(df2.to_dict('split')['data'])
In [50]: replace_dict
Out[50]: {'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5, 'F': 6}
df1['Value'] = df1['Type'].map(replace_dict)
In [52]: df1
Out[52]:
Type Value
0 A 1
1 A 1
2 B 2
3 A 1
4 C 3
Upvotes: 2