Python Regex re.compile clarification

Question

So i have a question about the following piece of code:

def OnChanMsg(self, nick, channel, message):
        if 'Username' in nick.GetNick():
            stripped = message.s.strip() #strips leading and lagging whitespaces
            regex = re.compile("\x1f|\x02|\x12|\x0f|\x16|\x03(?:\d{1,2}(?:,\d{1,2})?)?", re.UNICODE) #recompiles the mesasge minus colorcodes, bold etc
            ircstripped = regex.sub("", stripped) 
            all = re.findall(r'test\ for\ (.*)\: ->\ (.*)\ $(.*)$\ -\ $(.*)$\ - $(.*)$.*', ircstripped)

So my question(s) is(are) the following: 1) What the code does is relatively clear to me with the exception of the "(?:\d{1,2}(?:,\d{1,2})?)?" part, i just don't understand what it does and how it works, i did check the google developers codeschool videos, i also checked the python documentation, but when my goal is to strip an IRC message of its colors and other various formatting then what exactly does this part do in (if possible) laymans terms.

I found this inside the thread: How to strip color codes used by mIRC users?

(?: ... ) says to forget about storing what was found in the parenthesis (as we don't need to backreference it), ? means to match 0 or 1 and {n,m} means to match n to m of the previous grouping. Finally, \d means to match [0-9].

But im not really getting it =/

Christian Ternus · Accepted Answer

http://myregextester.com to the rescue!

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching 
) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
    \d{1,2}                  digits (0-9) (between 1 and 2 times
                             (matching the most amount possible))
----------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
----------------------------------------------------------------------
      ,                        ','
----------------------------------------------------------------------
      \d{1,2}                  digits (0-9) (between 1 and 2 times
                               (matching the most amount possible))
----------------------------------------------------------------------
    )?                       end of grouping
----------------------------------------------------------------------
  )?                       end of grouping
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

So, in other words: optionally capture 1-2 digits, optionally followed by a group consisting of a comma and 1-2 digits.

So the following would match (assuming a whole-line match):

but the following wouldn't:

200
a,b
!123p9
1000,2000

Python Regex re.compile clarification

Answers (2)

Related Questions