Nick
Nick

Reputation: 21

Regex. Split string depends on delimiter and include them

Here is my str example, I need to save delimiters near last word like dot, dash and space.

str example:

   a = 'Beautiful. is. better5-than ugly'

what I tried

re.split('\W+', a)
['Beautiful', 'is', 'better5', 'than', 'ugly']

expected output:

 ['Beautiful.', ' ', 'is.', ' ', 'better5-', 'than', ' ', 'ugly']

Is it possible?

Upvotes: 1

Views: 277

Answers (2)

Czaporka
Czaporka

Reputation: 2407

>>> import re
>>> a = 'Beautiful. is. better5-than ugly'
>>> re.findall("\w+[.-]?|\s+", a)
['Beautiful.', ' ', 'is.', ' ', 'better5-', 'than', ' ', 'ugly']
  • \w+[.-]? matches words with an optional dot or hyphen at the end.
  • \s+ matches whitespace.
  • | makes sure we capture either of the above.

Upvotes: 2

S.B
S.B

Reputation: 16476

Since we want our delimiters to be part of our result, we should keep them so, I used both "lookbehind" and "lookahead" assertions in the regex. You can read about them in the re module's documentation

import re
a = 'Beautiful. is. better5-than ugly'
print(re.split(r'(?<=[-. ])|(?= )', a))

Additional note: with "lookbehind" assertion I could achieve almost the same result, but for the last word "than " I need to include a "lookahead" assertion to my regex pattern (I mean |(?= )) to split that space too.

Upvotes: 2

Related Questions