Reputation: 193
I'm trying to take a string of ints and/or floats and create a list of floats. The string is going to have these brackets in them that need to be ignored. I'm using re.split
, but if my string begins and ends with a bracket, I get extra empty strings. Why is that?
Code:
import re
x = "[1 2 3 4][2 3 4 5]"
y = "1 2 3 4][2 3 4 5"
p = re.compile(r'[^\d\.]+')
print p.split(x)
print p.split(y)
Output:
['', '1', '2', '3', '4', '2', '3', '4', '5', '']
['1', '2', '3', '4', '2', '3', '4', '5']
Upvotes: 15
Views: 5575
Reputation: 785551
You can just use filter
to avoid empty results:
x = "[1 2 3 4][2 3 4 5]"
print filter(None, re.split(r'[^\d.]+', x))
# => ['1', '2', '3', '4', '2', '3', '4', '5']
Upvotes: 1
Reputation: 5289
If you use re.split
, then a delimiter at the beginning or end of the string causes an empty string at the beginning or end of the array in the result.
If you don't want this, use re.findall
with a regex that matches every sequence NOT containing delimiters.
Example:
import re
a = '[1 2 3 4]'
print(re.split(r'[^\d]+', a))
print(re.findall(r'[\d]+', a))
Output:
['', '1', '2', '3', '4', '']
['1', '2', '3', '4']
As others have pointed out in their answers, this may not be the perfect solution for this problem, but it is a general answer to the problem described in the title of the question, which I also had to solve when I found this question using Google.
Upvotes: 10
Reputation: 107347
As a more pythonic way you can just use a list comprehension and str.isdigit()
method to check of your character is digit :
>>> [i for i in y if i.isdigit()]
['1', '2', '3', '4', '2', '3', '4', '5']
And about your code first of all you need to split based on space or brackets that could be done with [\[\] ]
and for get rid of empty strings that is for leading and trailing brackets you can first strip
your string :
>>> y = "1 2 3 4][2 3 4 5"
>>> re.split(r'[\[\] ]+',y)
['1', '2', '3', '4', '2', '3', '4', '5']
>>> y = "[1 2 3 4][2 3 4 5]"
>>> re.split(r'[\[\] ]+',y)
['', '1', '2', '3', '4', '2', '3', '4', '5', '']
>>> re.split(r'[\[\] ]+',y.strip('[]'))
['1', '2', '3', '4', '2', '3', '4', '5']
You can also wrap your result with filter
function and using bool
function.
>>> filter(bool,re.split(r'[\[\] ]+',y))
['1', '2', '3', '4', '2', '3', '4', '5']
Upvotes: 1
Reputation: 31035
You can use regex to capture the content you want instead of splitting the string. You can use this regex:
(\d+)
Python code:
import re
p = re.compile(ur'(\d+)')
test_str = u"[1 2 3 4][2 3 4 5]"
re.findall(p, test_str)
Upvotes: 0