Reputation: 53
I want to check if the words in a dataframe B row exist inside another dataframe A row, and retrieve the LineNumber of dataframe A.
Example of dataframe A
LineNumber Description
2539 5401845 Either the well was very deep, or she fell very slowly,
4546 5409117 for she had plenty of time as she went down to look about her,
4368 5408517 and to wonder what was going to happen next
Example of dataframe B
Words
50062 well deep fell
44263 plenty time above
4731 plenty time down look
I want to now if ALL the words in each row of dataframe B are inside any row in dataframe A. If that's the case, I'd retrieve the LineNumber from dataframe A and assign it to dataframe B.
The output should be like this.
Words LineNumber
50062 well deep fell 5401845
44263 plenty time above
4731 plenty time down look 5409117
I have tried something like this but it's not working
a = 'for she had plenty of time as she went down to look about her,'
str = 'plenty time down look'
if all(x in str for x in a):
print(True)
else:
print(False)
Thanks
Upvotes: 3
Views: 63
Reputation: 1207
Make DataFrames
x = pd.DataFrame({"Description": ["for she had plenty of time as she went down to look about her",
"for she had of time as she went down to look about her"]})
>>> x
Description
0 for she had plenty of time as she went down to look about her
1 for she had of time as she went down to look about her
y = pd.DataFrame({"Description": ["plenty time down look"]})
>>> y
Description
0 plenty time down look
Match Description from dataframe y by index to dataframe x and get matching index from dataframe x
with_words = y["Description"].iloc[[0]].item().split()
with_regex = "".join(['(?=.*{})'.format(word) for word in with_words])
>>> with_regex
'(?=.*plenty)(?=.*time)(?=.*down)(?=.*look)'
>>> x.loc[(x.Description.str.contains(with_regex))].index.item()
0
Upvotes: 2
Reputation: 639
You are close with what you are trying to do. Try something like this:
a = 'for she had plenty of time as she went down to look about her,'
string = 'plenty time down look'
a = a.split(' ')
string = string.split(' ')
if all(x in a for x in string):
print(True)
else:
print(False)
The way you originally had x in string for x in a
has two issues. The first is that every element in string
and a
is a char, so to compare words you need to create a list of words which is why I included the split.
The second is that the logic x in string for x in a
says return True if each element in a
is in string
, but what you want is x in a for x in string
which will return True
if each element in string
is in a
.
Upvotes: 0