Reputation: 147
So i have a question about the following piece of code:
def OnChanMsg(self, nick, channel, message):
if 'Username' in nick.GetNick():
stripped = message.s.strip() #strips leading and lagging whitespaces
regex = re.compile("\x1f|\x02|\x12|\x0f|\x16|\x03(?:\d{1,2}(?:,\d{1,2})?)?", re.UNICODE) #recompiles the mesasge minus colorcodes, bold etc
ircstripped = regex.sub("", stripped)
all = re.findall(r'test\ for\ (.*)\: ->\ (.*)\ \((.*)\)\ -\ \((.*)\)\ - \((.*)\).*', ircstripped)
So my question(s) is(are) the following:
1) What the code does is relatively clear to me with the exception of the "(?:\d{1,2}(?:,\d{1,2})?)?"
part, i just don't understand what it does and how it works, i did check the google developers codeschool videos, i also checked the python documentation, but when my goal is to strip an IRC message of its colors and other various formatting then what exactly does this part do in (if possible) laymans terms.
I found this inside the thread: How to strip color codes used by mIRC users?
(?: ... ) says to forget about storing what was found in the parenthesis (as we don't need to backreference it), ? means to match 0 or 1 and {n,m} means to match n to m of the previous grouping. Finally, \d means to match [0-9].
But im not really getting it =/
Upvotes: 1
Views: 155
Reputation: 8492
http://myregextester.com to the rescue!
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
\d{1,2} digits (0-9) (between 1 and 2 times
(matching the most amount possible))
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
, ','
----------------------------------------------------------------------
\d{1,2} digits (0-9) (between 1 and 2 times
(matching the most amount possible))
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
So, in other words: optionally capture 1-2 digits, optionally followed by a group consisting of a comma and 1-2 digits.
So the following would match (assuming a whole-line match):
12
1
20
10,2
22,3
12,0
14,20
but the following wouldn't:
200
a,b
!123p9
1000,2000
Upvotes: 2
Reputation: 46872
(?:\d{1,2}(?:,\d{1,2})?)?
is just zero, one or two numbers, with 1 or 2 digits, separated by a comma.
(?:\d{1,2}(?:,\d{1,2})?)? = (?:\d{1,2}(?:,\d{1,2})?) followed by ?
= the whole thing is optional
(?:\d{1,2}(?:,\d{1,2})?) = \d{1,2}(?:,\d{1,2})? in a group that is not stored
\d{1,2}(?:,\d{1,2})? = \d{1,2} followed by (?:,\d{1,2})?
= 1 or 2 digits followed by (?:,\d{1,2})?
(?:,\d{1,2})? = (?:,\d{1,2}) followed by ?
= (?:,\d{1,2}) is optional
(?:,\d{1,2}) = ,\d{1,2} in a group that is not stored
,\d{1,2} = , (comma) followed by 1 or 2 digits
Upvotes: 0