Reputation: 21
Assume we have 2 dfs.
df1:
Product Name1 Name2
0 ABC aaa bbb
1 ASD sss ddd
2 ZCX xxx NaN
3 QWE qqq NaN
4 IOP iii ooo
df2:
Title Col1 Col2
0 aaaLKJ ... ...
1 zzzMNB ... ...
2 dddKJH ... ...
3 ooo123 ... ...
4 aaa098 ... ...
5 ttt676 ... ...
The output df should consists all of the df2 rows, in which the 'Title' starts with either 'Name1' or 'Name2' of df1.
Title Col1 Col2
0 aaaLKJ ... ...
2 dddKJH ... ...
3 ooo123 ... ...
4 aaa098 ... ...
Please let me know how this could be achieved. I have tried several ways with df.loc, isin, str.contains, but without luck ...
Upvotes: 2
Views: 68
Reputation: 88305
You could build a list with all names from both columns in df1
, and use Series.str.startswith
on Title
in df2
, which also works with a tuple
of strings, to find those which start with any term in names
:
names = df1.filter(like='Name').stack().to_numpy()
# array(['aaa', 'bbb', 'sss', 'ddd', 'xxx', 'qqq', 'iii', 'ooo'], dtype=object)
df2 = df2[df2.Title.str.startswith(tuple(names))]
print(df2)
Title Col1 Col2
0 aaaLKJ ... ...
2 dddKJH ... ...
3 ooo123 ... ...
4 aaa098 ... ...
Upvotes: 2