user9476284
user9476284

Reputation: 150

python re.split with maxsplit argument

when using re.split I'd expect the maxsplit to be the length of the returned list (-1).

The examples in the docs suggest so.

But when there is a capture group (and maybe some other cases) then I don't understand how the maxsplit argument works.

>>> re.split("(\W+)", "Words, words, words.", maxsplit=1)
['Words', ', ', 'words, words.']

>>> re.split("(:)", ":a:b::c", maxsplit=2)
['', ':', 'a', ':', 'b::c']
>>> re.split("((:))", ":a:b::c", maxsplit=2)
['', ':', ':', 'a', ':', ':', 'b::c']

What am I missing?

Upvotes: 0

Views: 1043

Answers (2)

Martin
Martin

Reputation: 363

So what I'm guessing is that maxsplit determines the number of splits, and the parentheses return additional groups.

Example
":a:b::c" with maxsplit=2 splits your string in three parts:
"", "a", "b::c"

But because the pattern "(:)" also contains a captured group, it's returned in between the parts: "", ":", "a", ":", "b::c"

If the pattern is "((:))", then each colon is returned twice in between the parts

Upvotes: 1

VisioN
VisioN

Reputation: 145458

It's not about maxsplit, it's about you using parentheses in the regular expression:

If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

DOCS: https://docs.python.org/3/library/re.html#re.split

Upvotes: 1

Related Questions