Khaled Koubaa
Khaled Koubaa

Reputation: 547

left join two dataframe using contains substring instead of the whole key columns

I have dataframe of stocks and another dataframe of stocks with company name and ticker:

#df1
Company_name            Industry
Buckle Inc. (The)       Retail
TILLY'S INC             Apparel
Teck Resources Ltd      Mining

#df2
Company_name            Ticker
Buckle Inc              BKE
TILLY'S, INC.           TLYS
Teck Resources Ltd      TECK

I want to add tickers (from df2) to each company name in df1. The problem is company names is sometime different, for example: Buckle Inc. (The) and Buckle Inc but they always share some substring.

I try merge but only work when 2 company names are 100% the same, like Teck Resources Ltd in my example.

# I tried this:
df.merge(df1, on='Company_name', how='left')

#return :
Company_name            Industry        Ticker
Teck Resources Ltd      Mining          TECK

This is the output I need:

Company_name            Industry        Ticker
Buckle Inc. (The)       Retail          BKE
Tilly's Inc.            Apparel         TLYS
Teck Resources Ltd      Mining          TECK

Upvotes: 1

Views: 19

Answers (1)

wwnde
wwnde

Reputation: 26676

I make one assumption, there are no duplicates in the Company_name

    df2['temp']=df2['Company_name'].str.lower().str.extract('(^\w+)')#Extract first word in string in df2.Company_name
    df1['temp']=df1['Company_name'].str.lower().str.extract('(^\w+)')#Extract first word in string in df1.Company_name
    df1=df1.assign(Ticker=df1['temp'].map(dict(zip(df2.temp,df2.Ticker)))).drop('temp',1)#Create dict and map




     Company_name      Ticker
0   Buckle Inc. (The)    BKE
1         TILLY'S INC   TLYS
2  Teck Resources Ltd   TECK

Upvotes: 1

Related Questions