nalyd88
nalyd88

Reputation: 5138

Python - Split string of numeric values with unknown delimiters

I need to parse the numeric values from a string that is not well formatted. Example:

"0    0    .1        .05       .05       0.        0.         .01"

or

"0,0,.1,.05,.05,0.,0.,.01"

As you can see the delimiter can vary from several spaces to commas with no spaces. Also, the numbers may be ints or floats. I would like to split on any number of consecutive spaces, tabs, and commas. I thought I could do this with the str.split() function, however I found that it only works with one delimiter argument and will not do commas by default.

Does anyone know a clever way to do this? Possibly with regular expressions?

Thanks in advance.

Upvotes: 2

Views: 2141

Answers (4)

vks
vks

Reputation: 67968

You can use re.split.

[ ,]+

You can split by this.

import re
y="0,0,.1,.05,.05,0.,0.,.01"
print re.split(r"[ ,]+",y)

Or

You can use simply use re.findall.Here you can have any delimiter.

import re
y="0,0,.1,.05,.05,0.,0.,.01"
print re.findall(r"\d*(?:\.\d+)?",y)

Upvotes: 2

hwnd
hwnd

Reputation: 70732

I would like to split on any number of consecutive spaces, tabs, and commas.

You could use re.split() to split by a regular expression.

>>> import re
>>> s = '0    0    .1        .05       .05       0.        0.         .01'
>>> re.split(r'[\s,]+', s)

['0', '0', '.1', '.05', '.05', '0.', '0.', '.01']

Note: The above will split accordingly on whitespace and commas. If you want to split strictly on <space>, tabs and commas, you could change the regular expression to [ \t,]+ ...

Upvotes: 3

Stefan Pochmann
Stefan Pochmann

Reputation: 28606

Regular expressions would work, but you could also just replace every comma with a space and then use regular split:

s.replace(',', ' ').split()

Demo:

>>> s = "0    0    .1        .05       .05       0.        0.         .01"
>>> s.replace(',', ' ').split()
['0', '0', '.1', '.05', '.05', '0.', '0.', '.01']

>>> s = "0,0,.1,.05,.05,0.,0.,.01"
>>> s.replace(',', ' ').split()
['0', '0', '.1', '.05', '.05', '0.', '0.', '.01']

Upvotes: 3

Martin Konecny
Martin Konecny

Reputation: 59611

You can split with the following regex: [, ]+

Example:

import re

pattern = r'[,\s]+'

row = "0    0    .1        .05       .05       0.        0.         .01"
re.split(pattern, row)
# > ['0', '0', '.1', '.05', '.05', '0.', '0.', '.01']

row = "0,0,.1,.05,.05,0.,0.,.01"
re.split(pattern, row)
# > ['0', '0', '.1', '.05', '.05', '0.', '0.', '.01']

Upvotes: 0

Related Questions