user11727742
user11727742

Reputation:

Why my code not working? to find a word before the match

s = 'A boy is playing and he is wearing shirt.'

My regex is ((?:\S+\s+)\bis\b)

My output: ['boy is' ,'he is']

Expected output: ['boy','he']

Upvotes: 1

Views: 70

Answers (3)

CypherX
CypherX

Reputation: 7353

Solution

To keep the solution closest to what you got earlier, you could replace the 'is' with '' and then strip off any white space left, using a list comprehension on the re.findall result.

import re

s = 'A boy is playing and he is wearing shirt.'
[x.replace('is','').strip() for x in re.findall('\s*([a-zA-Z+]*\s+is)', s)]

Output:

['boy', 'he']

Upvotes: 0

Nick
Nick

Reputation: 147206

You should change your regex to use a lookahead:

\S+(?=\s+is\b)

Demo on regex101

In python

import re

s = 'A boy is playing and he is wearing shirt.'
print(re.findall(r'\S+(?=\s+is\b)', s))

Output:

['boy', 'he']

Upvotes: 2

anubhava
anubhava

Reputation: 785406

You may reorganize your capture group a bit to keep is word outside the group and use re.findall:

>>> s = 'A boy is playing and he is wearing shirt.'
>>> re.findall(r'(\S+)\s+is\b', s)
['boy', 'he']

findall returns only captured group, if there is any in your regex.

Also note that there is no need to use \b (word boundary) after matching whitespaces.

Upvotes: 3

Related Questions