Snow
Snow

Reputation: 1138

How to map a DataFrame to another based on a column that starts with strings contained in the first DataFrame?

I have two DataFrames and I want to map information from one to the other.

First DataFrame:

    A         B    
0   pokaichi  56156 
1   klavatur  45156  
2   monopol   32154 
3   berate    98741 

Second DataFrame:

    C   D      E
0   be  18485  Bond
1   mo  14285  Roger
2   kl  74853  Mika
3   po  85655  Loki

How can I map the E column from the second DataFrame to the first DataFrame, so that the new column from the first DataFrame is a result of the string in the A column that starts with the string in C column?

So the first DataFrame should look like this after applying the mapping:

    A         B      E
0   pokaichi  56156  Loki
1   klavatur  45156  Mika
2   monopol   32154  Roger
3   berate    98741  Bond

I tried using str.startswith with map, but to no success.

Upvotes: 2

Views: 217

Answers (2)

Asish M.
Asish M.

Reputation: 2647

In [31]: df1
Out[31]:
          A      B
0  pokaichi  56156
1  klavatur  45156
2   monopol  32154
3    berate   9741

In [32]: df2
Out[32]:
    C      D      E
0  be  18485   Bond
1  mo  14285  Roger
2  kl  74853   Mika
3  po  85655   Loki

In [33]: df1['C'] = df1['A'].str[:2].map(df2.set_index('C')['E'])

In [34]: df1
Out[34]:
          A      B      C
0  pokaichi  56156   Loki
1  klavatur  45156   Mika
2   monopol  32154  Roger
3    berate   9741   Bond

Alternatively, you could do a merge if you are okay with creating a new dataframe.

In [43]: df1.merge(df2[['C', 'E']], left_on=df1['A'].str[:2], right_on='C').drop('C', axis=1)
Out[43]:
          A      B      E
0  pokaichi  56156   Loki
1  klavatur  45156   Mika
2   monopol  32154  Roger
3    berate   9741   Bond

Upvotes: 2

anky
anky

Reputation: 75120

You can use the below steps:

pat=r'({})'.format('|'.join(df2.C))
#'(be|mo|kl|po)'
d=df2.set_index('C')['E'] #creating a series with C as Index and E as values

Then we use: series.str.extract() to extract df2.C from df1.A and using series.map() we map the values to generate the expected output.

df1['E']=df1.A.str.extract(pat,expand=False).map(d)
print(df1)

          A      B      E
0  pokaichi  56156   Loki
1  klavatur  45156   Mika
2   monopol  32154  Roger
3    berate  98741   Bond

Upvotes: 2

Related Questions