Reputation: 78
I need to split a string by multiple delimiters.
My string is HELLO+WORLD-IT*IS=AMAZING
.
I would like the result be
["HELLO", "+", "WORLD", "-", "IT", "*", "IS", "=", "AMAZING"
I hear that re.findall() may handle it but I can't find out the solution.
Upvotes: 0
Views: 143
Reputation: 103874
Given:
s='HELLO+WORLD-IT*IS=AMAZING'
You can split on any break between a word and non word character as a general case with the word boundary assertion \b
:
>>> re.split(r'\b', s)
['', 'HELLO', '+', 'WORLD', '-', 'IT', '*', 'IS', '=', 'AMAZING', '']
And remove the ''
at the start and end like so:
>>> re.split(r'\b', ur_string)[1:-1]
['HELLO', '+', 'WORLD', '-', 'IT', '*', 'IS', '=', 'AMAZING']
Or if you know that is the full set of delimiters that you want to use for a split, define a character class of them and capture the delimiter:
>>> re.split(r'([+\-*=])', s)
['HELLO', '+', 'WORLD', '-', 'IT', '*', 'IS', '=', 'AMAZING']
Since \b
is a zero width assertion (it does not consume characters to match) you don't have to capture what the delimiter was that caused the split. The assertion of \b
is also true at the start and end of the string so those blanks need to be removed.
Since -
is used in a character class to define a range of characters such as [0-9]
you have to escape the -
in [+\-*=]
.
Upvotes: 2
Reputation: 299
Using re.split
works in this case. Put every delimiter in a capturing group:
pattern = "(\+|-|\*|=)"
result = re.split(pattern, string)
Upvotes: 2