CptSupermrkt
CptSupermrkt

Reputation: 7134

Why is re.split() returning an extra blank at the end of my resulting List?

print re.split("([0-9]{4})", "Spring2014")

results in

['Spring', '2014', '']

Where is that extra '' coming from at the end? My desired List is the above, without that extra blank item at the end. It's easy enough to just discard the extra item, but I'd just like to understand why re.split is including it.

Upvotes: 3

Views: 444

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1123420

You asked re.split() to split the text on 4 digits part; the string before '2014' contains 'Spring', and after that part is the string ''.

This is documented behaviour:

If there are capturing groups in the separator and it matches at the start of the string, the result will start with an empty string. The same holds for the end of the string:

>>> re.split('(\W+)', '...words, words...')
['', '...', 'words', ', ', 'words', '...', '']

That way, separator components are always found at the same relative indices within the result list (e.g., if there’s one capturing group in the separator, the 0th, the 2nd and so forth).

Upvotes: 3

Related Questions