Reputation: 580

Python split string retaining the bracket

I would like to split the string and eliminate the whitespaces such as

double a[3] = {0.0, 0.0, 0.0};

The expected output is

['double', 'a', '[', '3', ']', '=', '{', '0.0', ',', '0.0', ',', '0.0', '}', ';']

How could I do that with re module in Python?

Upvotes: 0

Answers (3)

Andrej Kesely

Reputation: 195458

Solution without re:

from itertools import groupby

s = "double a[3] = {0.0, 0.0, 0.0};"
delimiters = r"[]{};,"

out = []
for word in s.split():
    for k, g in groupby(word, lambda v: any(d in v for d in delimiters)):
        if k:
            out.extend(list(g))
        else:
            out.append("".join(g))

print(out)

Prints:

['double', 'a', '[', '3', ']', '=', '{', '0.0', ',', '0.0', ',', '0.0', '}', ';']

Upvotes: 0

AKX

Reputation: 169051

You can make use of the fact that re.split() retains delimiters in capture groups in the output:

import re
input_string = "double a[3] = {0.0, 0.0, 0.0};"
bits = [bit for bit in (bit.strip() for bit in re.split(r'((?:\d+\.\d+)|[,}=;]|\w+)', input_string)) if bit]
expected = ['double', 'a', '[', '3', ']', '=', '{', '0.0', ',', '0.0', ',', '0.0', '}', ';']
assert bits == expected

Upvotes: 2

Tim Biegeleisen

Reputation: 521457

One approach here might be to use re.findall:

inp = "double a[3] = {0.0, 0.0, 0.0};"
parts = re.findall(r'\d+(?:\.\d+)?|\w+|[^\s\w]', inp)
print(parts)

# ['double', 'a', '[', '3', ']', '=', '{', '0.0', ',', '0.0', ',', '0.0', '}', ';']

The regex pattern used here says to match:

\d+(?:\.\d+)? an integer or float
| OR
\w+ a word (such as "double")
| OR
[^\s\w] a single non word non whitespace (such as {)

Upvotes: 0

Python split string retaining the bracket

Answers (3)

Related Questions