Peter MacLeish
Peter MacLeish

Reputation: 35

Join two dataframe based on string occurance,similarity

I got 2 dataframe ( lets say df1, df2)

df1 has ( name , surname , department )

df2 has ( id , filename )

What I want is -> Merge them (lets say df3)

df3 -> ( id, filename, name, surname, department )

The common point is filename ends with worker's name.

Example:

Filename : /company/workers/john 

Name : john ( No duplicate name vals on df1,df2)

Normally In merge we use common columns but now there is no common column so How can I use this match-up/similarity to combine these 2 dataframes? If I cant use this similarity, how can I merge them ?

Upvotes: 2

Views: 200

Answers (3)

kantal
kantal

Reputation: 2407

Use str.rsplit(r"/",n=1,expand=True)[1].str.title(), where
rsplit: right split
n=1: max split
r"/": raw string, no escape seq.interpreted
expand: create new columns
title: steven --> Steven
Then merge them on "name".


In [25]: df1=pd.DataFrame( {"name":["John","Steven"], "surname":["Smith","Lee"], "departmen":["dep1","dep2"]})                

In [26]: df2=pd.DataFrame({"id":[240,250], "filename":["/company/workers/steven", "/company/workers/john"]})                  

In [27]: df1                                                                                                                  
Out[27]: 
     name surname departmen
0    John   Smith      dep1
1  Steven     Lee      dep2

In [28]: df2                                                                                                                  
Out[28]: 
    id                 filename
0  240  /company/workers/steven
1  250    /company/workers/john

In [29]: df2["name"]= df2.filename.str.rsplit(r"/",n=1,expand=True)[1].str.title()                                            

In [30]: df2                                                                                                                  
Out[30]: 
    id                 filename    name
0  240  /company/workers/steven  Steven
1  250    /company/workers/john    John

In [31]: pd.merge(df2,df1, on="name")                                                                                         
Out[31]: 
    id                 filename    name surname departmen
0  240  /company/workers/steven  Steven     Lee      dep2
1  250    /company/workers/john    John   Smith      dep1

Upvotes: 0

Georgina Skibinski
Georgina Skibinski

Reputation: 13397

Try this one:

pd.merge(df1, df2.apply(lambda x: pd.Series({"name": x.filename.split("/")[-1], "file_id": x.id, "filename": x.filename}), axis=1), on="name", how="left")

Upvotes: 1

Noppu
Noppu

Reputation: 59

You just have to split the filename column id df2 with / and then, get the last component

df2['name'] = df2['filename'].str.split('//').str[-1]

then use name column in df2 as key to merge :)

Upvotes: 1

Related Questions