S.B
S.B

Reputation: 16526

Empty matches for the split function in python re module

I just wondering how 'second empty string' came up in the result. could anyone tell me what's happened step by step?

>>> re.split(r'\W*', '...words...')
['', '', 'w', 'o', 'r', 'd', 's', '', '']

If i'm not wrong, first empty match is because of this sentence from python re module document:

If it matches at the start of the string, the result will start with an empty string. The same holds for the end of the string

Upvotes: 4

Views: 103

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

See the regex demo at regex101: enter image description here. It shows where matches occur. Now, recalling that re.split splits a string the string with the match values (here, empty strings, locations in string), you can easily see where the split occurs:

  • ... is found and split occurs => ['', 'words...']
  • The w is found, so \W* matches the empty space in front of it => ['', '', 'words...']
  • The o is found, so \W* matches the empty space in front of it => ['', '', 'w', 'o', 'rds...']
  • The r is found, so \W* matches the empty space in front of it => ['', '', 'w', 'o', 'r', 'ds...']
  • The d is found, so \W* matches the empty space in front of it => ['', '', 'w', 'o', 'r', 'd', 's...']
  • The s is found, so \W* matches the empty space in front of it => ['', '', 'w', 'o', 'r', 'd', 's', '...']
  • The ... is found, so \W* matches => ['', '', 'w', 'o', 'r', 'd', 's', ''] (note that the last '' is not just empty string, it is an empty string with end of string position that is still possible to match)
  • The end of string is found, so \W* matches this location => ['', '', 'w', 'o', 'r', 'd', 's', '', ''].

Upvotes: 1

Related Questions