user1718725
user1718725

Reputation: 21

python, regex, matching strings with repeating characters

I am trying to search Apache log files for specific entries related to specific vulnerability scans. I need to match strings from a separate file against the URI content in the weblogs. Some of the strings I am trying to find contain repeating special characters like '?'.

For example, I need to be able to match an attack that contains just the string '????????' but I don't want to be alerted on the string '??????????????????' because each attack is tied to a specific attack ID number. Therefore, using:

if attack_string in log_file_line:
    alert_me()

...will not work. Because of this, I decided to put the string into a regex:

if re.findall(r'\%s' % re.escape(attack_string),log_file_line):
    alert_me()

...which did not work either because any log file line containing the string '????????' is matched even if there are more than 8 '?' in the log file line.

I then tried adding boundaries to the regex:

if re.findall(r'\\B\%s\\B' % re.escape(attack_string),log_file_line):
    alert_me()

...which stopped matching in both cases. I need to be able to dynamically assign the string I am looking for but I don't want to match on just any line that contains the string. How can I accomplish this?

Upvotes: 2

Views: 931

Answers (1)

Toto
Toto

Reputation: 91375

How about:

(?:[^?]|^)\?{8}(?:[^?]|$)

Explanation:

(?-imsx:(?:[^?]|^)\?{8}(?:[^?]|$))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    [^?]                     any character except: '?'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    ^                        the beginning of the string
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
  \?{8}                    '?' (8 times)
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    [^?]                     any character except: '?'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    $                        before an optional \n, and the end of
                             the string
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

Upvotes: 1

Related Questions