Binwei Fang
Binwei Fang

Reputation: 13

python re.split() doubtful result

I'm a python beginner.I'm having doubt about the output about re.split()

text='alpha, beta,,,gamma dela'
In [9]: re.split('(,)+',text)
Out[9]: ['alpha', ',', ' beta', ',', 'gamma dela']

In [11]: re.split('(,+)',text)
Out[11]: ['alpha', ',', ' beta', ',,,', 'gamma dela']

In [7]: re.split('[,]+',text)
Out[7]: ['alpha', ' beta', 'gamma dela']

why these output are different? please help me ,thank you very much!

Upvotes: 1

Views: 173

Answers (1)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 477437

As is specified in the documentation of re.split:

re.split(pattern, string, maxsplit=0, flags=0)

Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. If maxsplit is nonzero, at most maxsplit splits occur, and the remainder of the string is returned as the final element of the list.

A capture group is usually described using parenthesis ((..)) that do not contain ?: or lookahead/lookbehind marker. So the first two regexes have capture groups:

  (,)+
# ^ ^
  (,+)
# ^  ^

In the first case the capture group is a single comma. So that means the last capture is used (a single comma thus). In the second case ((,+)) it can capture multiple commas (and a regex aims to capture as much as possible, so it captures here all).

In the last case, there is no capture group, so this means splitting is done and the text matched against the pattern is completely ignored.

Upvotes: 2

Related Questions