Reputation: 3
I have multiple dataframes. The first dataframe has certain strings
df_string = pd.DataFrame({'idx':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'string':['xx01122txt01', 'bea2125', 'spddoc0010', 'bon007', 'xx001122xls04', 'spdxls1122', 'bea1234', 'bon1234', 'xy02125doc00', 'irnppt1260']})
And the other three dataframes have substrings with some sort of mapping
df_name = pd.DataFrame({'code':['bon', 'bea', 'spd'],
'name':['james bond', 'mr bean', 'spider man']})
df_type = pd.DataFrame({'code':['doc', 'txt', 'xls'],
'type':['document', 'text', 'excel']})
df_desc = pd.DataFrame({'id':['1122', '1234', '2990', '2125'],
'desc':['facebook', 'twitter', 'instagram', 'snapchat']})
What I'm trying to do is, look for the substrings in the strings column and create a new dataframe with the mapped data. It needs to look something like this
df_output = pd.DataFrame({'idx': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'string': ['xx01122txt01', 'bea2125', 'spddoc0010', 'bon007', 'xx001122xls04', 'spdxls1122', 'bea1234', 'bon1234', 'xy02125doc00', 'irnppt1260'],
'desc': ['facebook', 'snapchat', '-', '-', 'facebook', 'facebook', 'twitter', 'twitter', 'snapchat', '-'],
'type': ['text', '-', 'document', '-', 'excel', 'excel', '-', '-', 'document', '-'],
'name': ['-', 'mr bean', 'spider man', 'james bond', '-', 'spider man', 'mr bean', 'james bond', '-', '-'
]})
Upvotes: 0
Views: 128
Reputation: 30050
You can use .str.extract
to extract the matched string in string
column then map
it to other column of df_name
, df_desc
df_string['desc'] = df_string['string'].str.extract('('+'|'.join(df_desc['id'])+')')[0].map(df_desc.set_index('id')['desc'])
df_string['type'] = df_string['string'].str.extract('('+'|'.join(df_type['code'])+')')[0].map(df_type.set_index('code')['type'])
df_string['name'] = df_string['string'].str.extract('('+'|'.join(df_name['code'])+')')[0].map(df_name.set_index('code')['name'])
print(df_string)
idx string desc type name
0 1 xx01122txt01 facebook text NaN
1 2 bea2125 snapchat NaN mr bean
2 3 spddoc0010 NaN document spider man
3 4 bon007 NaN NaN james bond
4 5 xx001122xls04 facebook excel NaN
5 6 spdxls1122 facebook excel spider man
6 7 bea1234 twitter NaN mr bean
7 8 bon1234 twitter NaN james bond
8 9 xy02125doc00 snapchat document NaN
9 10 irnppt1260 NaN NaN NaN
print(df_string.fillna('-'))
idx string desc type name
0 1 xx01122txt01 facebook text -
1 2 bea2125 snapchat - mr bean
2 3 spddoc0010 - document spider man
3 4 bon007 - - james bond
4 5 xx001122xls04 facebook excel -
5 6 spdxls1122 facebook excel spider man
6 7 bea1234 twitter - mr bean
7 8 bon1234 twitter - james bond
8 9 xy02125doc00 snapchat document -
9 10 irnppt1260 - - -
Upvotes: 1