Python Regex takes so long in some cases

Question

I compiled the following pattern

pattern = re.compile(
    r"""
    (?P.*?)
    \s*
    (?P\w+)
    \s*PACKET\s*
    (?P\w+)
    \s*
    (?P\w+)
    \s*
    (?P\w+)
    \s*
    (?P\d+\.\d+\.\d+\.\d+)
    \s*
    (?P\w+)
    \s*
    (?P.*?)
    \s*$$
    (?P[0-9]*)
    \s*
    (?P.*?)
    \s*
    (?P\w+)
    $$\s*
    (?P\w+)
    \s*
    \.(?P.*)\.
    """, re.VERBOSE
    )

to work with this string

2/1/2014 9:34:29 PM 05EC PACKET 00000000025E97A0 UDP Snd 10.10.10.10 ebbe R Q [8381 DR NXDOMAIN] A (1)9(1)a(3)c-0(11)19-330ff801(7)e0400b1(4)15e0(4)1ca7(4)2f4a(3)210(1)0(26)841f75qnhp97z6jknf946qwfm5(4)avts(6)domain(3)com(0)

And it successfully works

In [4]: pattern.findall(re.sub('$\d+$', '.', x))
Out[4]: 
[('2/1/2014 9:34:29 PM',
  '05EC',
  '00000000025E97A0',
  'UDP',
  'Snd',
  '10.10.10.10',
  'ebbe',
  'R Q',
  '8381',
  'DR',
  'NXDOMAIN',
  'A',
  '9.a.c-0.19-330ff801.e0400b1.15e0.1ca7.2f4a.210.0.841f75qnhp97z6jknf946qwfm5.avts.domain.com')]

The issue is that it takes so long in some cases, any idea how to enhance the pattern for consuming time.

JDB · Accepted Answer

Yep, you've got yourself a case of catastrophic backtracking, also known as an "evil regex", here:

\s*
(?P.*?)
\s*

Here:

\s*
(?P.*?)
\s*

And here:

\s*
\.(?P.*)\.

Replacing .* with \S* should do the trick.

For more information about what an evil regex is and why it's evil, check out this question:
How can I recognize an evil regex?

Python Regex takes so long in some cases

Answers (2)

Related Questions