Rahul Agarwal
Rahul Agarwal

Reputation: 4100

Capture string between two words but only 1st time

I have string like:

 text = "Why do Humans need to eat food? Humans eat food to survive."

I want to capture everything between Human and food but only 1st time.

Expected Output

Humans need to eat food

My Regex:

p =r'(\bHumans?\b.*?\bFoods?\b)'

Python Code:

re.findall(p, text, re.I|re.M|re.DOTALL)

The code correctly captures the string between Human and Food but it doesn't stops at 1st capture.

Research:

I have read that to make it non-greedy I need to put ? but I am unable to figure out where I should keep it to make it non-greedy. All other permutation and combination I tried I can't stopped it at 1st match.

Update

I am writing a lot of regexes to capture various other entities like this and parsing them in one shot and hence I can't change my re.findall logic.

Upvotes: 1

Views: 75

Answers (3)

han solo
han solo

Reputation: 6590

Try this:

>>> import re
>>> text = "Why do Humans need to eat food? Humans eat food to survive."
>>> re.search(r'Humans.*?food', text).group() # you want the all powerful non-greedy '?' :)
'Humans need to eat food'

Upvotes: 1

Pushpesh Kumar Rajwanshi
Pushpesh Kumar Rajwanshi

Reputation: 18357

For finding the first match only, Toto's answer is best but as you said you need to use findall only, you can just append .* at the end of your regex to match remaining text which won't result in any matches further.

(\bHumans?\b.*?\bFoods?\b).*
                          ^^ This eats remaining part of your text due to which there won't be any further matches.

Demo

Sample Python codes,

import re

text = "Why do Humans need to eat food? Humans eat food to survive."
p =r'(\bHumans?\b.*?\bFoods?\b).*'
print(re.findall(p, text, re.I|re.M|re.DOTALL))

Prints,

['Humans need to eat food']

Upvotes: 3

Toto
Toto

Reputation: 91385

Use search instead of findall:

import re
text = "Why do Humans need to eat food? Humans eat food to survive."
p =r'(\bHumans?\b.*?\bFoods?\b)'
res = re.search(p, text, re.I|re.M|re.DOTALL)
print(res.groups())

Output:

('Humans need to eat food',)

Or add .* at the end of the regex:

import re
text = "Why do Humans need to eat food? Humans eat food to survive."
p =r'(\bHumans?\b.*?\bFoods?\b).*'
#                      here ___^^
res = re.findall(p, text, re.I|re.M|re.DOTALL)
print(res)

Upvotes: 5

Related Questions