nu everest
nu everest

Reputation: 10249

Python Regex to findall Delimited Substrings

I have a string:

test2 = "-beginning realization 5 -singlespace -multispaceafter  not-this-one -whitespace\t\n -end"

I want a to find all of the substrings that begin with the minus sign (-).

I can find all "but" the last occurrence:

re.findall(ur"\B-(.*?)\s", test2)

returns [u'beginning', u'singlespace', u'multispaceafter', u'whitespace']

I can find "the last occurrence":

re.findall(ur"\B-(.*?)\Z", test2)

returns [u'end']

However, I want a regex that returns

[u'beginning', u'singlespace', u'multispaceafter', u'whitespace', u'end']

Upvotes: 0

Views: 136

Answers (4)

user557597
user557597

Reputation:

The end doesn't match because you force a whitespace in the regex.

Try:

 # (?:^|\s)-(.*?)(?=\s|$)

 (?: ^ | \s )
 -
 ( .*? )
 (?= \s | $ )

Upvotes: 1

vks
vks

Reputation: 67968

(?<=\s)-(.*?)(?=\s|$)|(?<=^)-(.*?)(?=\s|$)

Try this.See demo.

http://regex101.com/r/cN7qZ7/6

Upvotes: 1

hwnd
hwnd

Reputation: 70732

You can use a non-capturing group to assert that either whitespace or the end of the string follows.

>>> re.findall(r'\B-(.*?)(?:\s|$)', test2)

Although, instead of \B and the non-capturing group I recommend the following:

>>> re.findall(r'(?<!\S)-(\S+)', test2)

Upvotes: 3

Avinash Raj
Avinash Raj

Reputation: 174716

You could try the below code also,

>>> test2 = "-beginning realization 5 -singlespace -multispaceafter  not-this-one -whitespace\t\n -end"
>>> m = re.findall(r'(?:\s|^)-(\S+)', test2)
>>> m
['beginning', 'singlespace', 'multispaceafter', 'whitespace', 'end']

Upvotes: 2

Related Questions