Reputation: 63
I have two data frame columns of different lengths. DF1:
filename |
---|
b~cute |
b~cute2 |
nan |
b~cat |
b~dog |
nan |
DF2:
filename |
---|
sgjsg~12345~b~cute~v4.jpeg |
jgsgj~1233~b~dog~v4.jpeg |
jhdjd~1252~b~cat~v4.jpeg |
sggsn~1252~b~cute2~v4.jpeg |
I am trying to do a partial match to create a separate column in DF1 with the filename as such
filename | filepath |
---|---|
b~cute | sgjsg~12345~b~cute~v4.jpeg |
b~cute2 | sggsn~1252~b~cute2~v4.jpeg |
nan | nan |
b~cat | jhdjd~1252~b~cat~v4.jpeg |
b~dog | jgsgj~1233~b~dog~v4.jpeg |
nan | nan |
Upvotes: 2
Views: 68
Reputation: 120559
You can use a regex
:
import re
# Create pattern and escape regex
pat = (df1['filename'].dropna().sort_values(ascending=False)
.map(re.escape).str.cat(sep='|'))
match = df2['filename'].str.extract(f"({pat})", expand=False)
out = df1.merge(df2.rename(columns={'filename': 'filepath'})
.assign(filename=match), on='filename', how='left')
Output:
>>> out
filename filepath
0 b~cute sgjsg~12345~b~cute~v4.jpeg
1 b~cute2 sggsn~1252~b~cute2~v4.jpeg
2 NaN NaN
3 b~cat jhdjd~1252~b~cat~v4.jpeg
4 b~dog jgsgj~1233~b~dog~v4.jpeg
5 NaN NaN
Details:
>>> pat
'b\\~dog|b\\~cute2|b\\~cute|b\\~cat'
Upvotes: 1
Reputation: 649
Make a helper function and use it to create the column on DF1.
def match_fn(fn, filenames):
if not isinstance(fn, str):
return None
for filename in filenames:
if fn in filename:
return filename
return None
df1.loc[:, "filepath"] = df1.filename.apply(lambda fn: match_fn(fn, df2.filename.values))
Upvotes: 1