Reputation:
s = 'A boy is playing and he is wearing shirt.'
My regex is ((?:\S+\s+)\bis\b)
My output: ['boy is' ,'he is']
Expected output: ['boy','he']
Upvotes: 1
Views: 70
Reputation: 7353
To keep the solution closest to what you got earlier, you could replace
the 'is'
with ''
and then strip off any white space left, using a list comprehension on the re.findall
result.
import re
s = 'A boy is playing and he is wearing shirt.'
[x.replace('is','').strip() for x in re.findall('\s*([a-zA-Z+]*\s+is)', s)]
Output:
['boy', 'he']
Upvotes: 0
Reputation: 147206
You should change your regex to use a lookahead:
\S+(?=\s+is\b)
In python
import re
s = 'A boy is playing and he is wearing shirt.'
print(re.findall(r'\S+(?=\s+is\b)', s))
Output:
['boy', 'he']
Upvotes: 2
Reputation: 785406
You may reorganize your capture group a bit to keep is
word outside the group and use re.findall
:
>>> s = 'A boy is playing and he is wearing shirt.'
>>> re.findall(r'(\S+)\s+is\b', s)
['boy', 'he']
findall
returns only captured group, if there is any in your regex.
Also note that there is no need to use \b
(word boundary) after matching whitespaces.
Upvotes: 3