ragardner
ragardner

Reputation: 1975

Python regular expression split string into numbers and text/symbols

I would like to split a string into sections of numbers and sections of text/symbols my current code doesn't include negative numbers or decimals, and behaves weirdly, adding an empty list element on the end of the output

import re
mystring = 'AD%5(6ag 0.33--9.5'
newlist = re.split('([0-9]+)', mystring)
print (newlist)

current output:

['AD%', '5', '(', '6', 'ag ', '0', '.', '33', '--', '9', '.', '5', '']

desired output:

['AD%', '5', '(', '6', 'ag ', '0.33', '-', '-9.5']

Upvotes: 3

Views: 8567

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627507

Your issue is related to the fact that your regex captures one or more digits and adds them to the resulting list and digits are used as a delimiter, the parts before and after are considered. So if there are digits at the end, the split results in the empty string at the end to be added to the resulting list.

You may split with a regex that matches float or integer numbers with an optional minus sign and then remove empty values:

result = re.split(r'(-?\d*\.?\d+)', s)
result = filter(None, result)

To match negative/positive numbers with exponents, use

r'([+-]?\d*\.?\d+(?:[eE][-+]?\d+)?)'

The -?\d*\.?\d+ regex matches:

  • -? - an optional minus
  • \d* - 0+ digits
  • \.? - an optional literal dot
  • \d+ - one or more digits.

Upvotes: 4

OneExceptionAtATime
OneExceptionAtATime

Reputation: 11

As mentioned here before, there is no option to ignore the empty strings in re.split() but you can easily construct a new list the following way:

import re

mystring = "AD%5(6ag0.33--9.5"
newlist = [x for x in re.split('(-?\d+\.?\d*)', mystring) if x != '']
print newlist

output:

['AD%', '5', '(', '6', 'ag', '0.33', '-', '-9.5']

Upvotes: 1

Jan
Jan

Reputation: 43199

Unfortunately, re.split() does not offer an "ignore empty strings" option. However, to retrieve your numbers, you could easily use re.findall() with a different pattern:

import re

string = "AD%5(6ag0.33-9.5"
rx = re.compile(r'-?\d+(?:\.\d+)?')
numbers = rx.findall(string)

print(numbers)
# ['5', '6', '0.33', '-9.5']

Upvotes: 2

Related Questions