Reputation: 31
I have a function that takes in a string and a prefix word. I want to use re.findall to get a list of suffix words that follow the given prefix. For example:
string: "My mother gave my sister my robot."
prefix: "my"
result: ["mother", "sister", "robot"]
My implementation is
def suffix(txt, prefix):
rv = re.findall(prefix + r' \w{4-15}',txt)
rv = [i.replace(prefix,'') for i in rv]
return rv
However, I am getting [] returned. Can someone suggest how I can implement this, using re?
Upvotes: 0
Views: 2350
Reputation:
That's where you need Positive Lookbehind :
Here we go:
import re
def find_suffix(word,string):
pattern = '(?<=' + word + '\s)\w+'
return re.findall(pattern,string)
print(find_suffix('my',"my mother gave my sister my robot."))
output:
['mother', 'sister', 'robot']
Upvotes: 0
Reputation: 8769
Building on top of other answers here is a 1 liner
>>> s = "My mother gave my sister my robot."
>>> import re
>>> prefix = "my"
>>> re.findall(prefix + r'\s+(\w+)', s, re.IGNORECASE)
['mother', 'sister', 'robot']
>>>
Upvotes: 2
Reputation: 26315
Since @cdarke covered your main issues with your problem, another way would be to split string
into a list of words with re.sub()
, and if a word in the list is equal to any case of prefix
, get the previous word and add it to your resulting list.
Here is an example:
import re
string = "My mother gave my sister my robot."
prefix = "my"
words = re.sub("[^\w]", " ", string).split()
suffixes = [words[i] for i in range(len(words)) if words[i-1].lower() == prefix]
print(suffixes)
Which Outputs:
['mother', 'sister', 'robot']
Note: In order to check if any case in the words list match with prefix
, you can convert it to lowercase first with lower()
.
Upvotes: 0
Reputation: 44364
Several issues here, first the range separator inside a quantifier is a comma not a hyphen, so: {4,15}
instead of {4-15}
.
Second, you need to match both my
and My
, so the match should be case insensitive (re.IGNORECASE
).
Third, if you use a capturing parentheses group (the round brackets) with finditer
then you don't need to hack off the prefix.
Try this:
import re
def suffix(txt, prefix):
rv = []
for m in re.finditer(prefix + r' (\w{4,15})', txt, re.IGNORECASE):
rv.append(m.groups()[0])
return rv
print suffix("My mother gave my sister my robot.", "my")
Gives:
['mother', 'sister', 'robot']
Depending on needs, \b
(word boundary) might be better than a space to separate words. For example: "my, and your, stuff" would not match using a space.
Upvotes: 1