Reputation: 35
I got 2 dataframe ( lets say df1
, df2
)
df1
has ( name , surname , department )
df2
has ( id , filename )
What I want is -> Merge them (lets say df3
)
df3
-> ( id, filename, name, surname, department )
The common point is filename ends with worker's name.
Example:
Filename : /company/workers/john
Name : john ( No duplicate name vals on df1,df2)
Normally In merge we use common columns but now there is no common column so How can I use this match-up/similarity to combine these 2 dataframes? If I cant use this similarity, how can I merge them ?
Upvotes: 2
Views: 200
Reputation: 2407
Use str.rsplit(r"/",n=1,expand=True)[1].str.title(), where
rsplit: right split
n=1: max split
r"/": raw string, no escape seq.interpreted
expand: create new columns
title: steven --> Steven
Then merge them on "name".
In [25]: df1=pd.DataFrame( {"name":["John","Steven"], "surname":["Smith","Lee"], "departmen":["dep1","dep2"]})
In [26]: df2=pd.DataFrame({"id":[240,250], "filename":["/company/workers/steven", "/company/workers/john"]})
In [27]: df1
Out[27]:
name surname departmen
0 John Smith dep1
1 Steven Lee dep2
In [28]: df2
Out[28]:
id filename
0 240 /company/workers/steven
1 250 /company/workers/john
In [29]: df2["name"]= df2.filename.str.rsplit(r"/",n=1,expand=True)[1].str.title()
In [30]: df2
Out[30]:
id filename name
0 240 /company/workers/steven Steven
1 250 /company/workers/john John
In [31]: pd.merge(df2,df1, on="name")
Out[31]:
id filename name surname departmen
0 240 /company/workers/steven Steven Lee dep2
1 250 /company/workers/john John Smith dep1
Upvotes: 0
Reputation: 13397
Try this one:
pd.merge(df1, df2.apply(lambda x: pd.Series({"name": x.filename.split("/")[-1], "file_id": x.id, "filename": x.filename}), axis=1), on="name", how="left")
Upvotes: 1
Reputation: 59
You just have to split the filename column id df2 with / and then, get the last component
df2['name'] = df2['filename'].str.split('//').str[-1]
then use name column in df2 as key to merge :)
Upvotes: 1