Reputation: 495
I have two data frames I want to match partial strings by using str.contains
function then merge them.
Here is an example:
data1
email is_mane name id
[email protected] 1 there is rain 10
[email protected] 1 here is the food 9
[email protected] 1 let's go together 8
[email protected] 1 today is my birthday 6
data2
id name
1 the rain is beautiful
1 the food
2 together
4 my birthday
3 your birthday
And here is the code I wrote:
data.loc[data.name.str.contains('|'.join(data2.name)),:]
and the output:
email is_mane name id
[email protected] 1 here is the food 9
[email protected] 1 let's go together 8
[email protected] 1 today is my birthday 6
As you can see it did not return "there is rain" even that rain
word is contained in dara2
: could it be because of space?
Also I want to merge data1
with data2
so that will help me to know what email has match.
I would like to have the following output:
email is_mane name id id2 name2
[email protected] 1 here is the food 9 1 the food
[email protected] 1 let's go together 8 2 together
[email protected] 1 today is my birthday 6 4 my birthday
[email protected] 1 today is my birthday 6 3 your birthday
Is there is any way to do it?
Upvotes: 0
Views: 62
Reputation: 13377
If you're good with matching only full words you can do (so e.g. dog
and dogs
won't match)
data1["key"]=data1["name"].str.split(r"[^\w+]")
data2["key"]=data2["name"].str.split(r"[^\w+]")
data3=data1.explode("key").merge(data2.explode("key"), on="key", suffixes=["", "2"]).drop("key", axis=1).drop_duplicates()
Otherwise it's a matter of cross join, and applying str.contains(...)
to filter out the ones, which aren't matching.
Upvotes: 2