Tiro
Tiro

Reputation: 1911

(Python) Splitting string only on single instance of delimiter

I'm trying to extract numeric values from text strings that use dashes as delimiters, but also to indicate negative values:

"1.3"          # [1.3]
"1.3-2-3.9"    # [1.3, 2, 3.9]
"1.3-2--3.9"   # [1.3, 2, -3.9]
"-1.3-2--3.9"  # [-1.3, 2, -3.9]

At the moment, I'm manually checking for the "--" sequence, but this seems really ugly and prone to breaking.

def get_values(text):
    return map(lambda s: s.replace('n', '-'), text.replace('--', '-n').split('-'))

I've tried a few different approaches, using both the str.split() function and re.findall(), but none of them have quite worked.

For example, the following pattern should match all the valid strings, but I'm not sure how to use it with findall:

r"^-?\d(\.\d*)?(--?\d(\.\d*)?)*$"

Is there a general way to do this that I'm not seeing? Thanks!

Upvotes: 0

Views: 364

Answers (2)

user2555451
user2555451

Reputation:

@CasimiretHippolyte has given a very elegant Regex solution, but I would like to point out that you can do this pretty succinctly with just a list comprehension, iter, and next:

>>> def get_values(text):
...    it = iter(text.split("-"))
...    return [x or "-"+next(it) for x in it]
...
>>> get_values("1.3")
['1.3']
>>> get_values("1.3-2-3.9")
['1.3', '2', '3.9']
>>> get_values("1.3-2--3.9")
['1.3', '2', '-3.9']
>>> get_values("-1.3-2--3.9")
['-1.3', '2', '-3.9']
>>>

Also, if you use timeit.timeit, you will see that this solution is quite a bit faster than using Regex:

>>> from timeit import timeit
>>>
>>> # With Regex
>>> def get_values(text):
...     import re
...     return re.split('(?<=[0-9])-', text)
...
>>> timeit('get_values("-1.3-2--3.9")', 'from __main__ import get_values')
9.999720634885165
>>>
>>> # Without Regex
>>> def get_values(text):
...     it = iter(text.split("-"))
...     return [x or "-"+next(it) for x in it]
...
>>> timeit('get_values("-1.3-2--3.9")', 'from __main__ import get_values')
4.145546989910741
>>>

Upvotes: 2

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89565

You can try to split with this pattern with a lookbehind:

(?<=[0-9])-

(An hyphen preceded by a digit)

>>> import re
>>> re.split('(?<=[0-9])-', text)

With this condition, you are sure to not be after the start of the string or after an other hyphen.

Upvotes: 4

Related Questions