petetheat
petetheat

Reputation: 89

Pandas: Create dataframe column based on other dataframe

If I have 2 dataframes like these two:

import pandas as pd

df1 = pd.DataFrame({'Type':list('AABAC')})
df2 = pd.DataFrame({'Type':list('ABCDEF'), 'Value':[1,2,3,4,5,6]})

  Type
0    A
1    A
2    B
3    A
4    C

  Type  Value
0    A      1
1    B      2
2    C      3
3    D      4
4    E      5
5    F      6

I would like to add a column in df1 based on the values in df2. df2 only contains unique values, whereas df1 has multiple entries of each value. So the resulting df1 should look like this:

  Type Value
0    A     1
1    A     1
2    B     2
3    A     1
4    C     3

My actual dataframe df1 is quite long, so I need something that is efficient (I tried it in a loop but this takes forever).

Upvotes: 3

Views: 4063

Answers (3)

Vidhya G
Vidhya G

Reputation: 2320

Another way to do this is by using the label based indexer loc. First use the Type column as the index using .set_index, then access using the df1 column, and reset the index to the original with .reset_index:

df2.set_index('Type').loc[df1['Type'],:].reset_index()

Either use this as your new df1 or extract the Value column:

df1['Value'] = df2.set_index('Type').loc[df1['Type'],:].reset_index()['Value']

Upvotes: 0

EdChum
EdChum

Reputation: 393893

As requested I am posting a solution that uses map without the need to create a temporary dict:

In[3]:
df1['Value'] = df1['Type'].map(df2.set_index('Type')['Value'])
df1

Out[3]: 
  Type  Value
0    A      1
1    A      1
2    B      2
3    A      1
4    C      3

This relies on a couple things, that the key values that are being looked up exist otherwise we get a KeyError and that we don't have duplicate entries in df2 otherwise setting the index raises InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Upvotes: 4

Anton Protopopov
Anton Protopopov

Reputation: 31662

You could create dict from your df2 with to_dict method and then map result to Type column for df1:

replace_dict = dict(df2.to_dict('split')['data'])

In [50]: replace_dict
Out[50]: {'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5, 'F': 6}

df1['Value'] = df1['Type'].map(replace_dict)

In [52]: df1
Out[52]:
  Type  Value
0    A      1
1    A      1
2    B      2
3    A      1
4    C      3

Upvotes: 2

Related Questions