Andrius
Andrius

Reputation: 21188

Python - defining string split delimiter?

How could I define string delimiter for splitting in most efficient way? I mean to not need to use many if's etc?

I have strings that need to be splited strictly into two element lists. The problem is those strings have different symbols by which I can split them. For example:

'Hello: test1'. This one has split delimiter ': '. The other example would be: 'Hello - test1'. So this one would be ' - '. Also split delimiter could be ' -' or '- '. So if I know all variations of delimiters, how could I define them most efficiently?

First I did something like this:

strings = ['Hello - test', 'Hello- test', 'Hello -test']
for s in strings:
    delim = ' - '
    if len(s.split('- ', 1)) == 2:
        delim = '- '
    elif len(s.split(' -', 1)) == 2:
        delim = ' -'
    print s.split(delim, 1)[1])

But then I got new strings that had another unexpected delimiters. So doing this way I should add even more ifs to check other delimiters like ': '. But then I wondered if there is some better way to define them (there is not problem if I should need to include new delimiters in some kind of list if I would need to later on). Maybe regex would help or some other tool?

Upvotes: 0

Views: 1426

Answers (4)

Augusta
Augusta

Reputation: 7231

This isn't the best way, but if you want to avoid using re for some (or no) reason, this is what I would do:

>>> strings = ['Hello - test', 'Hello- test', 'Hello -test', 'Hello : test']
>>> delims = [':', '-']  # all possible delimiters; don't worry about spaces.
>>>
>>> for string in strings:
...     delim = next((d for d in delims if d in string), None) # finds the first delimiter in delims that's present in the string (if there is one)
...     if not delim:
...         continue  # No delimiter! (I don't know how you want to handle this possibility; this code will simply skip the string all together.)
...     print [s.strip() for s in string.split(delim, 1)]  # assuming you want them in list form.
['Hello', 'test']
['Hello', 'test']
['Hello', 'test']
['Hello', 'test']

This uses Python's native .split() to break the string at the delimiter, and then .strip() to trim the white space off the results, if there is any. I've used next to find the appropriate delimiter, but there are plenty of things you can swap that out with (especially if you like for blocks).

If you're certain that each string will contain at least one of the delimiters (preferably exactly one), then you can shave it down to this:

 ## with strings and delims defined...
>>> for string in strings:
...     delim = next(d for d in delims if d in string) # raises StopIteration at this line if there is no delimiter in the string.
...     print [s.strip() for s in string.split(delim, 1)]

I'm not sure if this is the most elegant solution, but it uses fewer if blocks, and you won't have to import anything to do it.

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174826

Put all the delimiters inside re.split function like below using logical OR | operator.

re.split(r': | - | -|- ', string)

Add maxsplit=1, if you want to do an one time split.

re.split(r': | - | -|- ', string, maxsplit=1)

Upvotes: 4

vks
vks

Reputation: 67988

\s*[:-]\s*

You can split by this.Use re.split(r"\s*[:-]\s*",string).See demo.

https://regex101.com/r/nL5yL3/14

You should use this if you can have delimiters like - or - or -.wherein you have can have multiple spaces.

Upvotes: 0

fredtantini
fredtantini

Reputation: 16576

You can use the split function of the re module

>>> strings = ['Hello1 - test1', 'Hello2- test2', 'Hello3 -test3', 'Hello4 :test4', 'Hello5 : test5']
>>> for s in strings:
...   re.split(" *[:-] *",s)
...
['Hello1', 'test1']
['Hello2', 'test2']
['Hello3', 'test3']
['Hello4', 'test4']
['Hello5', 'test5']

Where between [] you put all the possible delimiters. The * indicates that some spaces can be put before or after.

Upvotes: 1

Related Questions