Mike Barnes
Mike Barnes

Reputation: 4305

Python regular expression to search for words in a sentence

Im still learning the ropes with Python ad regular expressions and I need some help please! I am in need of a regular expression that can search a sentence for specific words. I have managed to create a pattern to search for a single word but how do i retrieve the other words i need to find? How would the re pattern look like to do this?

>>> question = "the total number of staff in 30?"
>>> re_pattern = r'\btotal.*?\b'
>>> m = re.findall(re_pattern, question)
['total']

It must look for the words "total" and "staff" Thanks Mike

Upvotes: 3

Views: 20558

Answers (3)

pemistahl
pemistahl

Reputation: 9584

Use the union operator | to search for all the words you need to find:

In [20]: re_pattern = r'\b(?:total|staff)\b'

In [21]: re.findall(re_pattern, question)
Out[21]: ['total', 'staff']

This matches your example above most closely. However, this approach only works if there are no other characters which have been prepended or appended to a word. This is often the case at the end of main and subordinate clauses in which a comma, a dot, an exclamation mark or a question mark are appended to the last word of the clause.

For example, in the question How many people are in your staff? the approach above wouldn't find the word staff because there is no word boundary at the end of staff. Instead, there is a question mark. But if you leave out the second \b at the end of the regular expression above, the expression would wrongly detect words in substrings, such as total in totally or totalities.

The best way to accomplish what you want is to extract all alphanumeric characters in your sentence first and then search this list for the words you need to find:

In [51]: def find_all_words(words, sentence):
....:     all_words = re.findall(r'\w+', sentence)
....:     words_found = []
....:     for word in words:
....:         if word in all_words:
....:             words_found.append(word)
....:     return words_found

In [52]: print find_all_words(['total', 'staff'], 'The total number of staff in 30?')
['total', 'staff'] 

In [53]: print find_all_words(['total', 'staff'], 'My staff is totally overworked.')
['staff']

Upvotes: 8

daya
daya

Reputation: 1037

question = "the total number of staff in 30?"
find=["total","staff"]
words=re.findall("\w+",question)
result=[x for x in find if x in words]
result
['total', 'staff']

Upvotes: 2

Abhijit
Abhijit

Reputation: 63707

Have you though to use something beyond Regex?

Consider this and and if it works expand from this solution

>>> 'total' in question.split()
True

Similarly

>>> words = {'total','staff'}
>>> [e   for e in words if e in question.split()]
['total', 'staff']

Upvotes: 1

Related Questions