Luluperam
Luluperam

Reputation: 53

Find words in dataframe row inside another dataframe row

I want to check if the words in a dataframe B row exist inside another dataframe A row, and retrieve the LineNumber of dataframe A.

Example of dataframe A

      LineNumber               Description
2539  5401845  Either the well was very deep, or she fell very slowly,
4546  5409117  for she had plenty of time as she went down to look about her, 
4368  5408517  and to wonder what was going to happen next

Example of dataframe B

                 Words
50062   well deep fell
44263   plenty time above
4731    plenty time down look

I want to now if ALL the words in each row of dataframe B are inside any row in dataframe A. If that's the case, I'd retrieve the LineNumber from dataframe A and assign it to dataframe B.

The output should be like this.

                     Words             LineNumber
50062   well deep fell                 5401845
44263   plenty time above
4731    plenty time down look          5409117

I have tried something like this but it's not working

a = 'for she had plenty of time as she went down to look about her,'
str = 'plenty time down look'
if all(x in str for x in a):
    print(True)
else:
    print(False)

Thanks

Upvotes: 3

Views: 63

Answers (2)

ChrisDanger
ChrisDanger

Reputation: 1207

Make DataFrames

x = pd.DataFrame({"Description": ["for she had plenty of time as she went down to look about her",
                                  "for she had of time as she went down to look about her"]})

>>> x
    Description
0   for she had plenty of time as she went down to look about her
1   for she had of time as she went down to look about her

y = pd.DataFrame({"Description": ["plenty time down look"]})
>>> y
    Description
0   plenty time down look

Match Description from dataframe y by index to dataframe x and get matching index from dataframe x

with_words = y["Description"].iloc[[0]].item().split()
with_regex = "".join(['(?=.*{})'.format(word) for word in with_words])

>>> with_regex
'(?=.*plenty)(?=.*time)(?=.*down)(?=.*look)'

>>> x.loc[(x.Description.str.contains(with_regex))].index.item()
0

Upvotes: 2

Denver
Denver

Reputation: 639

You are close with what you are trying to do. Try something like this:

a = 'for she had plenty of time as she went down to look about her,'
string = 'plenty time down look'
a = a.split(' ')
string = string.split(' ')
if all(x in a for x in string):
    print(True)
else:
    print(False)

The way you originally had x in string for x in a has two issues. The first is that every element in string and a is a char, so to compare words you need to create a list of words which is why I included the split.

The second is that the logic x in string for x in a says return True if each element in a is in string, but what you want is x in a for x in string which will return True if each element in string is in a.

Upvotes: 0

Related Questions