Reputation: 165
I have a string that looks something like this -
text = 'during the day, the color of the sky is blue. at sunset, the color of the sky is orange.'
I need to extract the words after a particular sub-string, in this case, 'sky is'. That is, I want a list that gives me this -
['blue', 'orange']
I have tried the following -
p1 =re.compile(r"is (.+?) ",re.I)
re.findall(p1,text)
But this gives the output only as
['blue']
If, however, my text is
text = 'during the day, the color of the sky is blue at sunset, the color of the sky is orange or yellow.'
and I run
p1 = re.compile(r"is (.+?) ",re.I)
re.findall(p1,text)
I get the output as -
['blue', 'orange']
Please help! I am new to regular expressions and I am stuck!
Upvotes: 0
Views: 2483
Reputation: 3016
In you regex pattern, you only capture the string that is followed by a blank space, however 'orange' is followed by a dot '.', that's why it is not captured.
You have to include the dot '.' in your pattern.
p1 = re.compile(r"is (.+?)[ \.]", re.I)
re.findall(p1,text)
# ['blue', 'orange']
Demo:
https://regex101.com/r/B8jhdF/2
EDIT:
If the word is at the end of the sentence and not followed by a dot '.', I suggest this:
text = 'during the day, the color of the sky is blue at sunset, the color of the sky is orange'
p1 = re.compile(r"is (.+?)([ \.]|$)")
found_patterns = re.findall(p1,text)
[elt[0] for elt in found_patterns]
# ['blue', 'orange']
Upvotes: 1
Reputation: 1337
It's not a very general solution, but it works for your string.
my_str = 'during the day, the color of the sky is blue. at sunset, the color of the sky is orange.'
r = re.compile('sky is [a-z]+')
out = [x.split()[-1] for x in r.findall(my_str)]
Upvotes: 1