rbc-2019
rbc-2019

Reputation: 165

Use regex to extract characters after a substring in python

I have a string that looks something like this -

text = 'during the day, the color of the sky is blue. at sunset, the color of the sky is orange.'

I need to extract the words after a particular sub-string, in this case, 'sky is'. That is, I want a list that gives me this -

['blue', 'orange']

I have tried the following -

p1 =re.compile(r"is (.+?) ",re.I)
re.findall(p1,text)

But this gives the output only as

['blue']

If, however, my text is

text = 'during the day, the color of the sky is blue at sunset, the color of the sky is orange or yellow.'

and I run

p1 = re.compile(r"is (.+?) ",re.I)
re.findall(p1,text)

I get the output as -

['blue', 'orange']

Please help! I am new to regular expressions and I am stuck!

Upvotes: 0

Views: 2483

Answers (2)

singrium
singrium

Reputation: 3016

In you regex pattern, you only capture the string that is followed by a blank space, however 'orange' is followed by a dot '.', that's why it is not captured.
You have to include the dot '.' in your pattern.

p1 = re.compile(r"is (.+?)[ \.]", re.I)
re.findall(p1,text)
# ['blue', 'orange']

Demo:
https://regex101.com/r/B8jhdF/2

EDIT:
If the word is at the end of the sentence and not followed by a dot '.', I suggest this:

text = 'during the day, the color of the sky is blue at sunset, the color of the sky is orange'
p1 = re.compile(r"is (.+?)([ \.]|$)")
found_patterns = re.findall(p1,text)
[elt[0] for elt in found_patterns]
# ['blue', 'orange']

Upvotes: 1

LukasNeugebauer
LukasNeugebauer

Reputation: 1337

It's not a very general solution, but it works for your string.

my_str = 'during the day, the color of the sky is blue. at sunset, the color of the sky is orange.'
r = re.compile('sky is [a-z]+')
out = [x.split()[-1] for x in r.findall(my_str)]

Upvotes: 1

Related Questions