Reputation: 2131
Given the code below, coming from the accepted answer of this question:
import re
pathD = "M30,50.1c0,0,25,100,42,75s10.3-63.2,36.1-44.5s33.5,48.9,33.5,48.9l24.5-26.3"
print(re.findall(r'[A-Za-z]|-?\d+\.\d+|\d+',pathD))
['M', '30', '50.1', 'c', '0', '0', '25', '100', '42', '75', 's', '10.3', '-63.2', '36.1', '-44.5', 's', '33.5', '48.9', '33.5', '48.9', 'l', '24.5', '-26.3']
If I include symbols such as '$' or '£' in the pathD
variable, the re
expression skips them as it targets [A-Za-z]
and digits
[A-Za-z] # words
|
-?\d+\.\d+ # floating point numbers
|
\d+ # integers
How do I modify the regex pattern above to also keep non alphanumeric symbols, as per desired output below?
new_pathD = '$100.0thousand'
new_re_expression = ???
print(re.findall(new_re_expression, new_pathD))
['$', '100.0', 'thousand']
~~~
Relevant SO posts below, albeit I could not exactly find how to keep symbols in the split exercise:
Split string into letters and numbers
split character data into numbers and letters
Python regular expression split string into numbers and text/symbols
Python - Splitting numbers and letters into sub-strings with regular expression
Upvotes: 1
Views: 512
Reputation: 37317
Try this:
compiled = re.compile(r'[A-Za-z]+|-?\d+\.\d+|\d+|\W')
compiled.findall("$100.0thousand")
# ['$', '100.0', 'thousand']
Here's an Advanced Edition™
advanced_edition = re.compile(r'[A-Za-z]+|-?\d+(?:\.\d+)?|(?:[^\w-]+|-(?!\d))+')
The difference is:
compiled.findall("$$$-100thousand") # ['$', '$', '$', '-', '100', 'thousand']
advanced_edition.findall("$$$-100thousand") # ['$$$', '-100', 'thousand']
Upvotes: 4