fjurt
fjurt

Reputation: 793

Creating new column by mapping to dictionary (with string contain match)

I am trying to create in df1 the column Factor based on the dictionary df2. However the Code columns for mapping are not exactly the same and the dictionary only contain the Code strings partially.

import pandas as pd
df1 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-02', '2021-01-02', '2021-01-03'],
    'Ratings':[9.0, 8.0, 5.0, 3.0, 2, 3, 6, 5],
    'Code':['R:EST 5R', 'R:EKG EK', 'R:EKG EK', 'R:EST 5R', 'R:EKGP', 'R:EST 5R', 'R:OID_P', 'R:OID_P']})

df2 = pd.DataFrame({
    'Code':['R:EST', 'R:EKG', 'R:OID'],
    'Factor':[1, 1.3, 0.9]})

So far, I wasn't able to map the data frames correctly, because the columns are not exactly the same. The column Code does not necessary start with "R:".

df1['Factor'] = df1['Code'].map(df2.set_index('Code')['Factor'])

This is how the preferred output would look like:

df3 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-02', '2021-01-02', '2021-01-03'],
    'Ratings':[9.0, 8.0, 5.0, 3.0, 2, 3, 6, 5],
    'Code':['R:EST 5R', 'R:EKG EK', 'R:EKG EK', 'R:EST 5R', 'R:EKGP', 'R:EST 5R', 'R:OID_P', 'R:OID_P'],
    'Factor':[1, 1.3, 1.3, 1, 1.3, 1, 0.9, 0.9]})

Thanks a lot!

Upvotes: 0

Views: 207

Answers (1)

Amir saleem
Amir saleem

Reputation: 1496

>>> df1['Code'].str[:5].map(df2.set_index('Code')['Factor'])
0    1.0
1    1.3
2    1.3
3    1.0
4    1.3
5    1.0
6    0.9
7    0.9
Name: Code, dtype: float64

>>> (df2.Code
         .apply(lambda x:df1.Code.str.contains(x))
         .T
         .idxmax(axis=1)
         .apply(lambda x:df2.Factor.iloc[x])
)

0    1.0
1    1.3
2    1.3
3    1.0
4    1.3
5    1.0
6    0.9
7    0.9
dtype: float64

Upvotes: 1

Related Questions