Reputation: 21
Here is my str example, I need to save delimiters near last word like dot, dash and space.
str example:
a = 'Beautiful. is. better5-than ugly'
what I tried
re.split('\W+', a)
['Beautiful', 'is', 'better5', 'than', 'ugly']
expected output:
['Beautiful.', ' ', 'is.', ' ', 'better5-', 'than', ' ', 'ugly']
Is it possible?
Upvotes: 1
Views: 277
Reputation: 2407
>>> import re
>>> a = 'Beautiful. is. better5-than ugly'
>>> re.findall("\w+[.-]?|\s+", a)
['Beautiful.', ' ', 'is.', ' ', 'better5-', 'than', ' ', 'ugly']
\w+[.-]?
matches words with an optional dot or hyphen at the end.\s+
matches whitespace.|
makes sure we capture either of the above.Upvotes: 2
Reputation: 16476
Since we want our delimiters to be part of our result, we should keep them so, I used both "lookbehind" and "lookahead" assertions in the regex. You can read about them in the re
module's documentation
import re
a = 'Beautiful. is. better5-than ugly'
print(re.split(r'(?<=[-. ])|(?= )', a))
Additional note: with "lookbehind" assertion I could achieve almost the same result, but for the last word "than " I need to include a "lookahead" assertion to my regex pattern (I mean |(?= )
) to split that space too.
Upvotes: 2