user3498593
user3498593

Reputation: 95

Parsing Snort Alert File with Regex

I'm trying to use regex in Python to parse out the source, destination (IPs and ports) and the time stamp from a snort alert file. Example as below:

03/09-14:10:43.323717  [**] [1:2008015:9] ET MALWARE User-Agent (Win95) [**] [Classification: A Network Trojan was detected] [Priority: 1] {TCP} 172.16.116.194:28692 -> 205.181.112.65:80

I have a regex for the IP, but it doesn't fire correctly because of the port in the IP. How can I get the port separate from the IP?

^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$

Upvotes: 2

Views: 2331

Answers (4)

Ro Yo Mi
Ro Yo Mi

Reputation: 15000

Description

^((?:[0-9]{2}[-\/:.]){5}[0-9]{6}).*[{]TCP[}]\s*(((?:[0-9]{1,3}[.]){1,3}[0-9]{1,3}):([0-9]{1,6}))\s*->\s*(((?:[0-9]{1,3}[.]){1,3}[0-9]{1,3}):([0-9]{1,6}))

Regular expression visualization

** To see the image better, simply right click the image and select view in new window

This regular expression will do the following:

  • Captures the timestamp into capture group 1
  • Captures the source IP address and port into capture groups 2, 3, 4
  • Captures the destination IP address and port into capture groups 5, 6, 7
  • requires the IP source and destination to be proceeded by {TCP} incase the message also contains an IP address.

Example

Live Demo

https://regex101.com/r/hD4fW8/1

Sample text

03/09-14:10:43.323717  [**] [1:2008015:9] ET MALWARE User-Agent (Win95) [**] [Classification: A Network Trojan was detected] [Priority: 1] {TCP} 172.16.116.194:28692 -> 205.181.112.65:80

Sample Matches

MATCH 1
1.  [0-21]  `03/09-14:10:43.323717`
2.  [145-165]   `172.16.116.194:28692`
3.  [145-159]   `172.16.116.194`
4.  [160-165]   `28692`
5.  [169-186]   `205.181.112.65:80`
6.  [169-183]   `205.181.112.65`
7.  [184-186]   `80`

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    (?:                      group, but do not capture (5 times):
----------------------------------------------------------------------
      [0-9]{2}                 any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
      [-\/:.]                  any character of: '-', '\/', ':', '.'
----------------------------------------------------------------------
    ){5}                     end of grouping
----------------------------------------------------------------------
    [0-9]{6}                 any character of: '0' to '9' (6 times)
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  [{]                      any character of: '{'
----------------------------------------------------------------------
  TCP                      'TCP'
----------------------------------------------------------------------
  [}]                      any character of: '}'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    (                        group and capture to \3:
----------------------------------------------------------------------
      (?:                      group, but do not capture (between 1
                               and 3 times (matching the most amount
                               possible)):
----------------------------------------------------------------------
        [0-9]{1,3}               any character of: '0' to '9'
                                 (between 1 and 3 times (matching the
                                 most amount possible))
----------------------------------------------------------------------
        [.]                      any character of: '.'
----------------------------------------------------------------------
      ){1,3}                   end of grouping
----------------------------------------------------------------------
      [0-9]{1,3}               any character of: '0' to '9' (between
                               1 and 3 times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of \3
----------------------------------------------------------------------
    :                        ':'
----------------------------------------------------------------------
    (                        group and capture to \4:
----------------------------------------------------------------------
      [0-9]{1,6}               any character of: '0' to '9' (between
                               1 and 6 times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of \4
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  ->                       '->'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \5:
----------------------------------------------------------------------
    (                        group and capture to \6:
----------------------------------------------------------------------
      (?:                      group, but do not capture (between 1
                               and 3 times (matching the most amount
                               possible)):
----------------------------------------------------------------------
        [0-9]{1,3}               any character of: '0' to '9'
                                 (between 1 and 3 times (matching the
                                 most amount possible))
----------------------------------------------------------------------
        [.]                      any character of: '.'
----------------------------------------------------------------------
      ){1,3}                   end of grouping
----------------------------------------------------------------------
      [0-9]{1,3}               any character of: '0' to '9' (between
                               1 and 3 times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of \6
----------------------------------------------------------------------
    :                        ':'
----------------------------------------------------------------------
    (                        group and capture to \7:
----------------------------------------------------------------------
      [0-9]{1,6}               any character of: '0' to '9' (between
                               1 and 6 times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of \7
----------------------------------------------------------------------
  )                        end of \5
----------------------------------------------------------------------

Upvotes: 1

tobi-wan-kenobi
tobi-wan-kenobi

Reputation: 36

If I understand you correctly, you want to capture the IPs and the ports separately, right?

In that case, using "groups" in the regular expression would solve your problem:

result = re.search(r'((\d{1,3}\.){3}\d{1,3}):(\d{1,5})', input)

Now, result.group(1) contains the IP address and result.group(3) the port.

Upvotes: 1

Will
Will

Reputation: 24699

This should extract the necessary parts from the full line:

r'([0-9:./-]+)\s+.*?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})\s+->\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})'

See this example:

In [22]: line = '03/09-14:10:43.323717  [**] [1:2008015:9] ET MALWARE User-Agent (Win95) [**] [Classification: A Network Trojan was detected] [Priority: 1] {TCP} 172.16.116.194:28692 -> 205.181.112.65:80'

In [23]: m = re.match(r'([0-9:./-]+)\s+.*?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})\s+->\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})', line)

In [24]: m.group(1)
Out[24]: '03/09-14:10:43.323717'

In [25]: m.group(2)
Out[25]: '172.16.116.194'

In [26]: m.group(3)
Out[26]: '28692'

In [27]: m.group(4)
Out[27]: '205.181.112.65'

In [28]: m.group(5)
Out[28]: '80'

Upvotes: 3

Yaron
Yaron

Reputation: 1242

You can separate them into different capture groups this way:

(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})

Losing both ^ and $ will give you the ability to match in the middle of the line not just as a whole line.

Upvotes: 1

Related Questions