Reputation: 1911
I'm trying to extract numeric values from text strings that use dashes as delimiters, but also to indicate negative values:
"1.3" # [1.3]
"1.3-2-3.9" # [1.3, 2, 3.9]
"1.3-2--3.9" # [1.3, 2, -3.9]
"-1.3-2--3.9" # [-1.3, 2, -3.9]
At the moment, I'm manually checking for the "--" sequence, but this seems really ugly and prone to breaking.
def get_values(text):
return map(lambda s: s.replace('n', '-'), text.replace('--', '-n').split('-'))
I've tried a few different approaches, using both the str.split() function and re.findall(), but none of them have quite worked.
For example, the following pattern should match all the valid strings, but I'm not sure how to use it with findall:
r"^-?\d(\.\d*)?(--?\d(\.\d*)?)*$"
Is there a general way to do this that I'm not seeing? Thanks!
Upvotes: 0
Views: 364
Reputation:
@CasimiretHippolyte has given a very elegant Regex solution, but I would like to point out that you can do this pretty succinctly with just a list comprehension, iter
, and next
:
>>> def get_values(text):
... it = iter(text.split("-"))
... return [x or "-"+next(it) for x in it]
...
>>> get_values("1.3")
['1.3']
>>> get_values("1.3-2-3.9")
['1.3', '2', '3.9']
>>> get_values("1.3-2--3.9")
['1.3', '2', '-3.9']
>>> get_values("-1.3-2--3.9")
['-1.3', '2', '-3.9']
>>>
Also, if you use timeit.timeit
, you will see that this solution is quite a bit faster than using Regex:
>>> from timeit import timeit
>>>
>>> # With Regex
>>> def get_values(text):
... import re
... return re.split('(?<=[0-9])-', text)
...
>>> timeit('get_values("-1.3-2--3.9")', 'from __main__ import get_values')
9.999720634885165
>>>
>>> # Without Regex
>>> def get_values(text):
... it = iter(text.split("-"))
... return [x or "-"+next(it) for x in it]
...
>>> timeit('get_values("-1.3-2--3.9")', 'from __main__ import get_values')
4.145546989910741
>>>
Upvotes: 2
Reputation: 89565
You can try to split with this pattern with a lookbehind:
(?<=[0-9])-
(An hyphen preceded by a digit)
>>> import re
>>> re.split('(?<=[0-9])-', text)
With this condition, you are sure to not be after the start of the string or after an other hyphen.
Upvotes: 4