ybin
ybin

Reputation: 575

python Keyword matching(keyword list - column)

supposed dataset,

    Name    Value
0   K   Ieatapple
1   Y   bananaisdelicious
2   B   orangelikesomething 
3   Q   bluegrape
4   C   appleislike

and I have keyword list like

[apple, banana]

In this dataset, matching column 'Value' - [keyword list]

*I mean matching is keyword in list in 'Value'

I would like to see how the keywords in the list match column, so.. I want to find out how much the matching rate is.

Ultimately, what I want to know is 'Finding match rate between keywords and columns' Percentage, If I can, filtered dataframe

Thank you.

Edit

In my real dataset, There are keywords in the sentence,

Ex,

Ilikeapplethanbananaandorange

so It doesn`t work if use keyword - keyword matching(1:1).

Upvotes: 0

Views: 1071

Answers (1)

Erfan
Erfan

Reputation: 42946

Use str.contains to match words to your sentences:

keywords = ['apple', 'banana']
df['Value'].str.contains("|".join(keywords)).sum() / len(df)

# 0.6

Or if you want to keep the rows:


df[df['Value'].str.contains("|".join(keywords))]

  Name                Value
0    K          I eat apple
1    Y  banana is delicious
4    C          appleislike

More details

The pipe | is the or operator in regular expression:

So we join our list of words with a pipe to match one of these words:

>>> keywords = ['apple', 'banana']
>>> "|".join(keywords)
'apple|banana'

So in regular expression we have the statement now:

match rows where the sentence contains "apple" OR "banana"

Upvotes: 2

Related Questions