Fatima
Fatima

Reputation: 495

merge two data frame after use str.contains?

I have two data frames I want to match partial strings by using str.contains function then merge them.

Here is an example:

data1

      email     is_mane         name           id
    [email protected]     1   there is rain          10
    [email protected]    1   here is the food        9
    [email protected]    1   let's go together       8
    [email protected]    1   today is my birthday    6


data2

    id  name
    1   the rain is beautiful
    1   the food
    2   together
    4   my birthday
    3   your birthday

And here is the code I wrote:

data.loc[data.name.str.contains('|'.join(data2.name)),:]

and the output:

        email   is_mane     name               id
    [email protected]    1   here is the food        9
    [email protected]    1   let's go together       8
    [email protected]    1   today is my birthday    6

As you can see it did not return "there is rain" even that rain word is contained in dara2: could it be because of space?

Also I want to merge data1 with data2 so that will help me to know what email has match.

I would like to have the following output:


        email   is_mane     name               id      id2       name2
    [email protected]    1   here is the food        9       1       the food
    [email protected]    1   let's go together       8       2       together
    [email protected]    1   today is my birthday    6       4       my birthday
    [email protected]    1   today is my birthday    6       3       your birthday

Is there is any way to do it?

Upvotes: 0

Views: 62

Answers (1)

Georgina Skibinski
Georgina Skibinski

Reputation: 13377

If you're good with matching only full words you can do (so e.g. dog and dogs won't match)

data1["key"]=data1["name"].str.split(r"[^\w+]")
data2["key"]=data2["name"].str.split(r"[^\w+]")

data3=data1.explode("key").merge(data2.explode("key"), on="key", suffixes=["", "2"]).drop("key", axis=1).drop_duplicates()

Otherwise it's a matter of cross join, and applying str.contains(...) to filter out the ones, which aren't matching.

Upvotes: 2

Related Questions