quinn
quinn

Reputation: 31

given input prefix, get suffix in python re

I have a function that takes in a string and a prefix word. I want to use re.findall to get a list of suffix words that follow the given prefix. For example:

string: "My mother gave my sister my robot." 
prefix: "my"
result: ["mother", "sister", "robot"]

My implementation is

def suffix(txt, prefix):
    rv = re.findall(prefix + r' \w{4-15}',txt)
    rv = [i.replace(prefix,'') for i in rv]
    return rv

However, I am getting [] returned. Can someone suggest how I can implement this, using re?

Upvotes: 0

Views: 2350

Answers (4)

user9158931
user9158931

Reputation:

That's where you need Positive Lookbehind :

Here we go:

import re

def find_suffix(word,string):
    pattern = '(?<=' + word + '\s)\w+'
    return re.findall(pattern,string)
print(find_suffix('my',"my mother gave my sister my robot."))

output:

['mother', 'sister', 'robot']

Upvotes: 0

riteshtch
riteshtch

Reputation: 8769

Building on top of other answers here is a 1 liner

>>> s = "My mother gave my sister my robot."
>>> import re
>>> prefix = "my"
>>> re.findall(prefix + r'\s+(\w+)', s, re.IGNORECASE)
['mother', 'sister', 'robot']
>>> 

Upvotes: 2

RoadRunner
RoadRunner

Reputation: 26315

Since @cdarke covered your main issues with your problem, another way would be to split string into a list of words with re.sub(), and if a word in the list is equal to any case of prefix, get the previous word and add it to your resulting list.

Here is an example:

import re

string = "My mother gave my sister my robot."
prefix = "my"

words = re.sub("[^\w]", " ",  string).split()

suffixes = [words[i] for i in range(len(words)) if words[i-1].lower() == prefix]

print(suffixes)

Which Outputs:

['mother', 'sister', 'robot']

Note: In order to check if any case in the words list match with prefix, you can convert it to lowercase first with lower().

Upvotes: 0

cdarke
cdarke

Reputation: 44364

Several issues here, first the range separator inside a quantifier is a comma not a hyphen, so: {4,15} instead of {4-15}.

Second, you need to match both my and My, so the match should be case insensitive (re.IGNORECASE).

Third, if you use a capturing parentheses group (the round brackets) with finditer then you don't need to hack off the prefix.

Try this:

import re

def suffix(txt, prefix):
    rv = []
    for m in re.finditer(prefix + r' (\w{4,15})', txt, re.IGNORECASE):
        rv.append(m.groups()[0])

    return rv

print suffix("My mother gave my sister my robot.", "my")

Gives:

['mother', 'sister', 'robot']

Depending on needs, \b (word boundary) might be better than a space to separate words. For example: "my, and your, stuff" would not match using a space.

Upvotes: 1

Related Questions