Reputation: 5138
I need to parse the numeric values from a string that is not well formatted. Example:
"0 0 .1 .05 .05 0. 0. .01"
or
"0,0,.1,.05,.05,0.,0.,.01"
As you can see the delimiter can vary from several spaces to commas with no spaces. Also, the numbers may be ints or floats. I would like to split on any number of consecutive spaces, tabs, and commas. I thought I could do this with the str.split()
function, however I found that it only works with one delimiter argument and will not do commas by default.
Does anyone know a clever way to do this? Possibly with regular expressions?
Thanks in advance.
Upvotes: 2
Views: 2141
Reputation: 67968
You can use re.split
.
[ ,]+
You can split by this.
import re
y="0,0,.1,.05,.05,0.,0.,.01"
print re.split(r"[ ,]+",y)
Or
You can use simply use re.findall
.Here you can have any delimiter.
import re
y="0,0,.1,.05,.05,0.,0.,.01"
print re.findall(r"\d*(?:\.\d+)?",y)
Upvotes: 2
Reputation: 70732
I would like to split on any number of consecutive spaces, tabs, and commas.
You could use re.split()
to split by a regular expression.
>>> import re
>>> s = '0 0 .1 .05 .05 0. 0. .01'
>>> re.split(r'[\s,]+', s)
['0', '0', '.1', '.05', '.05', '0.', '0.', '.01']
Note: The above will split accordingly on whitespace and commas. If you want to split strictly on <space>
, tabs and commas, you could change the regular expression to [ \t,]+
...
Upvotes: 3
Reputation: 28606
Regular expressions would work, but you could also just replace every comma with a space and then use regular split
:
s.replace(',', ' ').split()
Demo:
>>> s = "0 0 .1 .05 .05 0. 0. .01"
>>> s.replace(',', ' ').split()
['0', '0', '.1', '.05', '.05', '0.', '0.', '.01']
>>> s = "0,0,.1,.05,.05,0.,0.,.01"
>>> s.replace(',', ' ').split()
['0', '0', '.1', '.05', '.05', '0.', '0.', '.01']
Upvotes: 3
Reputation: 59611
You can split with the following regex: [, ]+
Example:
import re
pattern = r'[,\s]+'
row = "0 0 .1 .05 .05 0. 0. .01"
re.split(pattern, row)
# > ['0', '0', '.1', '.05', '.05', '0.', '0.', '.01']
row = "0,0,.1,.05,.05,0.,0.,.01"
re.split(pattern, row)
# > ['0', '0', '.1', '.05', '.05', '0.', '0.', '.01']
Upvotes: 0