Maxima
Maxima

Reputation: 362

Match list of strings with a block of text

Beginner here:

I have a block of text:

For example: 'hey this is a block of text, for an example, wow looks cool blah blah blah angiotensin enzyme looks cool okay.But what about angiotensin enzym well I dont know.'

and a list of words: ['angiotensin enzyme serum', 'some diff enzyme', 'angiotensin enzyme a1']

My end goal is to find from the list of words which string matches/fuzzy matches from the block of text.

What have I tried: difflib.get_close_matches

Output Required: 'angiotensin enzyme serum', 'angiotensin enzyme a1'

Output order isnt a concern.

For other blocks of text, some other string from list would match. Block isnt constant.

Is there a way to achieve this?

Upvotes: 0

Views: 359

Answers (1)

alani
alani

Reputation: 13079

Using fuzzywuzzy (from PyPi):

from fuzzywuzzy import fuzz

text = 'hey this is a block of text, for an example, wow looks cool blah blah blah angiotensin enzyme looks cool okay.But what about angiotensin enzym well I dont know.'

words = ['angiotensin enzyme serum', 'some diff enzyme', 'angiotensin enzyme a1']

matches = [w for w in words if fuzz.partial_ratio(text, w) > 70.]

Obviously you will want to adjust the threshold value to suit, but the values are well separated in this example:

>>> print(matches)
['angiotensin enzyme serum', 'angiotensin enzyme a1']

>>> for w in words:
...     print(w, fuzz.partial_ratio(text, w))
... 
angiotensin enzyme serum 83
some diff enzyme 56
angiotensin enzyme a1 90

Upvotes: 2

Related Questions