clstaudt
clstaudt

Reputation: 22448

joining pandas dataframes by string prefix

I am looking for an efficient way to perform the following join on two pandas.DataFrames:

The first one contains in column A strings like:

A  ....

Spam|One
Spam|Two
Ham
Eggs

The second one is a reference table with string keys as the index and a text description:

index      description
Spam       "..."
Ham        "..." 
Eggs       "...."
BakedBeans "...."

Each key in the index can be a prefix to a string in A. Now I'd like to join the description column to column A where the value in A matches the prefix. I have a feeling that there is an efficient pandas one-liner for it...

We can assume that all values in A have the format "prefix|suffix" or "prefix". Perhaps that can speed up things.

Upvotes: 2

Views: 866

Answers (1)

jezrael
jezrael

Reputation: 863166

Use Series.str.split with select first values of lists and Series.map by Series:

print (df1)
          A
0  Spam|One
1  Spam|Two
2       Ham
3      Eggs

print (df2)  
     description
Spam         aaa
Ham           sd
Eggs         rty

print (df2.index)
Index(['Spam', 'Ham', 'Eggs'], dtype='object')

df1['description'] = df1['A'].str.split('|').str[0].map(df2['description'])
print (df1)
          A description
0  Spam|One         aaa
1  Spam|Two         aaa
2       Ham          sd
3      Eggs         rty

More general solution should be Series.str.extract, with ^ for extract starting strings and join by | for regex OR by df2.index:

pat = '(' + '|'.join('^' + df2.index) + ')'
df1['new'] = df1['A'].str.extract(pat,expand=False).map(df2['description'])

Upvotes: 1

Related Questions