re.split with spaces in python

Question

I have a string of text that looks like this:

'                     19,301         14,856        18,554'

Where is a space.

I'm trying to split it on the white space, but I need to retain all of the white space as an item in the new list. Like this:

['                     ', '19,301','        ', '14,856', '        ', '18,554']

I have been using the following code:

re.split(r'( +)(?=[0-9])', item)

and it returns:

['', '                     ', '19,301', '        ', '14,856', '        ', '18,554']

Notice that it always adds an empty element to the beginning of my list. It's easy enough to drop it, but I'm really looking to understand what is going on here, so I can get the code to treat things consistently. Thanks.

lextoumbourou · Accepted Answer

When using the re.split method, if the capture group is matched at the start of a string, the "result will start with an empty string". The reason for this is so that join method can behave as the inverse of the split method.

It might not make a lot of sense for your case, where the separator matches are of varying sizes, but if you think about the case where the separators were a | character and you wanted to perform a join on them, with the extra empty string it would work:

>> item = '|19,301|14,856|18,554'
>> items = re.split(r'\|', item)
>> print items
['', '19,301', '14,856', '18,554']
>> '|'.join(items)
'|19,301|14,856|18,554'

But without it, the initial pipe would be missing:

>> items = ['19,301', '14,856', '18,554']
>> '|'.join(items)
'19,301|14,856|18,554'

re.split with spaces in python

Answers (2)

Related Questions