Reputation: 150
when using re.split
I'd expect the maxsplit
to be the length of the returned list (-1).
The examples in the docs suggest so.
But when there is a capture group (and maybe some other cases) then I don't understand how the maxsplit
argument works.
>>> re.split("(\W+)", "Words, words, words.", maxsplit=1)
['Words', ', ', 'words, words.']
>>> re.split("(:)", ":a:b::c", maxsplit=2)
['', ':', 'a', ':', 'b::c']
>>> re.split("((:))", ":a:b::c", maxsplit=2)
['', ':', ':', 'a', ':', ':', 'b::c']
What am I missing?
Upvotes: 0
Views: 1043
Reputation: 363
So what I'm guessing is that maxsplit
determines the number of splits, and the parentheses return additional groups.
Example
":a:b::c"
with maxsplit=2
splits your string in three parts:
"", "a", "b::c"
But because the pattern "(:)"
also contains a captured group, it's returned in between the parts:
"", ":", "a", ":", "b::c"
If the pattern is "((:))"
, then each colon is returned twice in between the parts
Upvotes: 1
Reputation: 145458
It's not about maxsplit
, it's about you using parentheses in the regular expression:
If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.
DOCS: https://docs.python.org/3/library/re.html#re.split
Upvotes: 1