user4794127
user4794127

Reputation: 193

Why are there extra empty strings at the beginning and end of the list returned by re.split?

I'm trying to take a string of ints and/or floats and create a list of floats. The string is going to have these brackets in them that need to be ignored. I'm using re.split, but if my string begins and ends with a bracket, I get extra empty strings. Why is that?

Code:

import re
x = "[1 2 3 4][2 3 4 5]"
y =  "1 2 3 4][2 3 4 5"
p = re.compile(r'[^\d\.]+')
print p.split(x)
print p.split(y)

Output:

['', '1', '2', '3', '4', '2', '3', '4', '5', '']
['1', '2', '3', '4', '2', '3', '4', '5']

Upvotes: 15

Views: 5575

Answers (4)

anubhava
anubhava

Reputation: 785551

You can just use filter to avoid empty results:

x = "[1 2 3 4][2 3 4 5]"

print filter(None, re.split(r'[^\d.]+', x))
# => ['1', '2', '3', '4', '2', '3', '4', '5']

Upvotes: 1

Florian Winter
Florian Winter

Reputation: 5289

If you use re.split, then a delimiter at the beginning or end of the string causes an empty string at the beginning or end of the array in the result.

If you don't want this, use re.findall with a regex that matches every sequence NOT containing delimiters.

Example:

import re

a = '[1 2 3 4]'
print(re.split(r'[^\d]+', a))
print(re.findall(r'[\d]+', a))

Output:

['', '1', '2', '3', '4', '']
['1', '2', '3', '4']

As others have pointed out in their answers, this may not be the perfect solution for this problem, but it is a general answer to the problem described in the title of the question, which I also had to solve when I found this question using Google.

Upvotes: 10

Kasravnd
Kasravnd

Reputation: 107347

As a more pythonic way you can just use a list comprehension and str.isdigit() method to check of your character is digit :

>>> [i for i in y if i.isdigit()]
['1', '2', '3', '4', '2', '3', '4', '5']

And about your code first of all you need to split based on space or brackets that could be done with [\[\] ] and for get rid of empty strings that is for leading and trailing brackets you can first strip your string :

>>> y =  "1 2 3 4][2 3 4 5"
>>> re.split(r'[\[\] ]+',y)
['1', '2', '3', '4', '2', '3', '4', '5']
>>> y =  "[1 2 3 4][2 3 4 5]"
>>> re.split(r'[\[\] ]+',y)
['', '1', '2', '3', '4', '2', '3', '4', '5', '']
>>> re.split(r'[\[\] ]+',y.strip('[]'))
['1', '2', '3', '4', '2', '3', '4', '5']

You can also wrap your result with filter function and using bool function.

>>> filter(bool,re.split(r'[\[\] ]+',y))
['1', '2', '3', '4', '2', '3', '4', '5']

Upvotes: 1

Federico Piazza
Federico Piazza

Reputation: 31035

You can use regex to capture the content you want instead of splitting the string. You can use this regex:

(\d+)

Working demo

enter image description here

Python code:

import re
p = re.compile(ur'(\d+)')
test_str = u"[1 2 3 4][2 3 4 5]"

re.findall(p, test_str)

Upvotes: 0

Related Questions