Reputation: 95
I'm trying to use regex in Python to parse out the source, destination (IPs and ports) and the time stamp from a snort alert file. Example as below:
03/09-14:10:43.323717 [**] [1:2008015:9] ET MALWARE User-Agent (Win95) [**] [Classification: A Network Trojan was detected] [Priority: 1] {TCP} 172.16.116.194:28692 -> 205.181.112.65:80
I have a regex for the IP, but it doesn't fire correctly because of the port in the IP. How can I get the port separate from the IP?
^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$
Upvotes: 2
Views: 2331
Reputation: 15000
^((?:[0-9]{2}[-\/:.]){5}[0-9]{6}).*[{]TCP[}]\s*(((?:[0-9]{1,3}[.]){1,3}[0-9]{1,3}):([0-9]{1,6}))\s*->\s*(((?:[0-9]{1,3}[.]){1,3}[0-9]{1,3}):([0-9]{1,6}))
** To see the image better, simply right click the image and select view in new window
This regular expression will do the following:
{TCP}
incase the message also contains an IP address.Live Demo
https://regex101.com/r/hD4fW8/1
Sample text
03/09-14:10:43.323717 [**] [1:2008015:9] ET MALWARE User-Agent (Win95) [**] [Classification: A Network Trojan was detected] [Priority: 1] {TCP} 172.16.116.194:28692 -> 205.181.112.65:80
Sample Matches
MATCH 1
1. [0-21] `03/09-14:10:43.323717`
2. [145-165] `172.16.116.194:28692`
3. [145-159] `172.16.116.194`
4. [160-165] `28692`
5. [169-186] `205.181.112.65:80`
6. [169-183] `205.181.112.65`
7. [184-186] `80`
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
(?: group, but do not capture (5 times):
----------------------------------------------------------------------
[0-9]{2} any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
[-\/:.] any character of: '-', '\/', ':', '.'
----------------------------------------------------------------------
){5} end of grouping
----------------------------------------------------------------------
[0-9]{6} any character of: '0' to '9' (6 times)
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
[{] any character of: '{'
----------------------------------------------------------------------
TCP 'TCP'
----------------------------------------------------------------------
[}] any character of: '}'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
(?: group, but do not capture (between 1
and 3 times (matching the most amount
possible)):
----------------------------------------------------------------------
[0-9]{1,3} any character of: '0' to '9'
(between 1 and 3 times (matching the
most amount possible))
----------------------------------------------------------------------
[.] any character of: '.'
----------------------------------------------------------------------
){1,3} end of grouping
----------------------------------------------------------------------
[0-9]{1,3} any character of: '0' to '9' (between
1 and 3 times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
( group and capture to \4:
----------------------------------------------------------------------
[0-9]{1,6} any character of: '0' to '9' (between
1 and 6 times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \4
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
-> '->'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \5:
----------------------------------------------------------------------
( group and capture to \6:
----------------------------------------------------------------------
(?: group, but do not capture (between 1
and 3 times (matching the most amount
possible)):
----------------------------------------------------------------------
[0-9]{1,3} any character of: '0' to '9'
(between 1 and 3 times (matching the
most amount possible))
----------------------------------------------------------------------
[.] any character of: '.'
----------------------------------------------------------------------
){1,3} end of grouping
----------------------------------------------------------------------
[0-9]{1,3} any character of: '0' to '9' (between
1 and 3 times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \6
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
( group and capture to \7:
----------------------------------------------------------------------
[0-9]{1,6} any character of: '0' to '9' (between
1 and 6 times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \7
----------------------------------------------------------------------
) end of \5
----------------------------------------------------------------------
Upvotes: 1
Reputation: 36
If I understand you correctly, you want to capture the IPs and the ports separately, right?
In that case, using "groups" in the regular expression would solve your problem:
result = re.search(r'((\d{1,3}\.){3}\d{1,3}):(\d{1,5})', input)
Now, result.group(1)
contains the IP address and result.group(3)
the port.
Upvotes: 1
Reputation: 24699
This should extract the necessary parts from the full line:
r'([0-9:./-]+)\s+.*?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})\s+->\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})'
See this example:
In [22]: line = '03/09-14:10:43.323717 [**] [1:2008015:9] ET MALWARE User-Agent (Win95) [**] [Classification: A Network Trojan was detected] [Priority: 1] {TCP} 172.16.116.194:28692 -> 205.181.112.65:80'
In [23]: m = re.match(r'([0-9:./-]+)\s+.*?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})\s+->\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})', line)
In [24]: m.group(1)
Out[24]: '03/09-14:10:43.323717'
In [25]: m.group(2)
Out[25]: '172.16.116.194'
In [26]: m.group(3)
Out[26]: '28692'
In [27]: m.group(4)
Out[27]: '205.181.112.65'
In [28]: m.group(5)
Out[28]: '80'
Upvotes: 3
Reputation: 1242
You can separate them into different capture groups this way:
(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})
Losing both ^
and $
will give you the ability to match in the middle of the line not just as a whole line.
Upvotes: 1