Empty matches for the split function in python re module

Question

I just wondering how 'second empty string' came up in the result. could anyone tell me what's happened step by step?

>>> re.split(r'\W*', '...words...')
['', '', 'w', 'o', 'r', 'd', 's', '', '']

If i'm not wrong, first empty match is because of this sentence from python re module document:

If it matches at the start of the string, the result will start with an empty string. The same holds for the end of the string

Wiktor Stribiżew · Accepted Answer

See the regex demo at regex101: . It shows where matches occur. Now, recalling that re.split splits a string the string with the match values (here, empty strings, locations in string), you can easily see where the split occurs:

... is found and split occurs => ['', 'words...']
The w is found, so \W* matches the empty space in front of it => ['', '', 'words...']
The o is found, so \W* matches the empty space in front of it => ['', '', 'w', 'o', 'rds...']
The r is found, so \W* matches the empty space in front of it => ['', '', 'w', 'o', 'r', 'ds...']
The d is found, so \W* matches the empty space in front of it => ['', '', 'w', 'o', 'r', 'd', 's...']
The s is found, so \W* matches the empty space in front of it => ['', '', 'w', 'o', 'r', 'd', 's', '...']
The ... is found, so \W* matches => ['', '', 'w', 'o', 'r', 'd', 's', ''] (note that the last '' is not just empty string, it is an empty string with end of string position that is still possible to match)
The end of string is found, so \W* matches this location => ['', '', 'w', 'o', 'r', 'd', 's', '', ''].

Empty matches for the split function in python re module

Answers (1)

Related Questions