How to count word or word_group occurrences in a string (phrase)

Question

I have keywords as a column in a dataframe (D1), which are 1-gram, 2-gram and in some cases 3-grams as well. I need to search for these grams in another dataframe (D2) column as having Phrases and count the occurence of the n-grams, so as to provide them with some weightage.

I tried using nested looping, but it is too much computational expensive, also, the results which i get are pretty disappointing, single characters such as 'a' 'in' are also getting matched.

word_list = data['Words'].values.tolist() #converting the keywords into a list
s = pd.Series({w: pos_phrases.Phrases.str.contains(w, flags=re.IGNORECASE).sum() for w in word_list})

The phrases are in pos_phrases under Phrases. Some of the keywords are:

'high-fidelity', 'hi-fi', 'surgical', 'straight', 'true', 'dead on target','wide of the mark', etc.

Phrases are just like conversation between two people. e.g.,

Sample Phrase: "Hello Good evening, how are you, so can you point out the facts which lead to this eventful night"
Keywords to match: "Good evening", "eventful", "event"

here, "event" must not match, because it is part of "eventful". However, it is getting matched. I hope i am able to explain my requirement.

How to count word or word_group occurrences in a string (phrase)

Answers (1)

Related Questions