Eakan
Eakan

Reputation: 78

How to split a string by multiple delimiters and also store it in Python?

I need to split a string by multiple delimiters. My string is HELLO+WORLD-IT*IS=AMAZING.

I would like the result be ["HELLO", "+", "WORLD", "-", "IT", "*", "IS", "=", "AMAZING"

I hear that re.findall() may handle it but I can't find out the solution.

Upvotes: 0

Views: 143

Answers (2)

dawg
dawg

Reputation: 103874

Given:

s='HELLO+WORLD-IT*IS=AMAZING'

You can split on any break between a word and non word character as a general case with the word boundary assertion \b:

>>> re.split(r'\b', s)
['', 'HELLO', '+', 'WORLD', '-', 'IT', '*', 'IS', '=', 'AMAZING', '']

And remove the '' at the start and end like so:

>>> re.split(r'\b', ur_string)[1:-1]
['HELLO', '+', 'WORLD', '-', 'IT', '*', 'IS', '=', 'AMAZING']

Or if you know that is the full set of delimiters that you want to use for a split, define a character class of them and capture the delimiter:

>>> re.split(r'([+\-*=])', s)
['HELLO', '+', 'WORLD', '-', 'IT', '*', 'IS', '=', 'AMAZING']

Since \b is a zero width assertion (it does not consume characters to match) you don't have to capture what the delimiter was that caused the split. The assertion of \b is also true at the start and end of the string so those blanks need to be removed.

Since - is used in a character class to define a range of characters such as [0-9] you have to escape the - in [+\-*=].

Upvotes: 2

sommervold
sommervold

Reputation: 299

Using re.split works in this case. Put every delimiter in a capturing group:

    pattern = "(\+|-|\*|=)"
    result = re.split(pattern, string)

Upvotes: 2

Related Questions