Reputation: 580
I would like to split the string and eliminate the whitespaces such as
double a[3] = {0.0, 0.0, 0.0};
The expected output is
['double', 'a', '[', '3', ']', '=', '{', '0.0', ',', '0.0', ',', '0.0', '}', ';']
How could I do that with re module in Python?
Upvotes: 0
Views: 63
Reputation: 195458
Solution without re
:
from itertools import groupby
s = "double a[3] = {0.0, 0.0, 0.0};"
delimiters = r"[]{};,"
out = []
for word in s.split():
for k, g in groupby(word, lambda v: any(d in v for d in delimiters)):
if k:
out.extend(list(g))
else:
out.append("".join(g))
print(out)
Prints:
['double', 'a', '[', '3', ']', '=', '{', '0.0', ',', '0.0', ',', '0.0', '}', ';']
Upvotes: 0
Reputation: 169051
You can make use of the fact that re.split()
retains delimiters in capture groups in the output:
import re
input_string = "double a[3] = {0.0, 0.0, 0.0};"
bits = [bit for bit in (bit.strip() for bit in re.split(r'((?:\d+\.\d+)|[,}=;]|\w+)', input_string)) if bit]
expected = ['double', 'a', '[', '3', ']', '=', '{', '0.0', ',', '0.0', ',', '0.0', '}', ';']
assert bits == expected
Upvotes: 2
Reputation: 521457
One approach here might be to use re.findall
:
inp = "double a[3] = {0.0, 0.0, 0.0};"
parts = re.findall(r'\d+(?:\.\d+)?|\w+|[^\s\w]', inp)
print(parts)
# ['double', 'a', '[', '3', ']', '=', '{', '0.0', ',', '0.0', ',', '0.0', '}', ';']
The regex pattern used here says to match:
\d+(?:\.\d+)?
an integer or float|
OR\w+
a word (such as "double")|
OR[^\s\w]
a single non word non whitespace (such as {
)Upvotes: 0