Reputation: 59
We are looking to dump our PMDF logs into Splunk and I am trying to parse the PMDF SMTP logs, specifically the message, and I'm hitting an issue where a named capturing group (dst_channel) may or may not have a value. Here is my regex so far:
\d{2}\-\w{3}\-\d{4}\s\d{2}\:\d{2}\:\d{2}\.\d{2}\s(?P<src_channel>\w+)\s+(?P<dst_channel>\w+)\s(?P<code>\w+)\s(?P<bytes>\d+)\s(?P<from>\w.+)\srfc822
I'm able to match the following message, in which tcp_msx_out_2 is the dst_channel
02-Feb-2017 08:00:19.60 tcp_exempt tcp_msx_out_2 E 2 [email protected] rfc822;[email protected] [email protected] <[email protected]> pmdf list.xyz.com ([x.x.x.x])
however, I'm not matching the following logs that doesn't contain a dst_channel value:
02-Feb-2017 09:00:01.59 tcp_imap_int Q 12 [email protected] rfc822;[email protected] [email protected] <6940401380880269855036@PT-D69> pmdf [email protected]: smtp;452 4.2.2 Over quota
The next named capturing group I have is code E in the first message example, and Q in the second), and when the dst_channel is not there, the regex is not capturing all of the codes.
How can I modify my regex for conditional statements so that if the dst_channel is there, it grabs the value, but if not, regex continues on and is able to consistently grab the values for the other named capturing groups I have?
Upvotes: 2
Views: 376
Reputation: 626835
I suggest you use
\d{2}-\w{3}-\d{4}\s+\d{2}:\d{2}:\d{2}\.\d{2}\s+(?P<src_channel>\w+)(?:\s+(?P<dst_channel>\w+))?\s+(?P<code>\w+)\s+(?P<bytes>\d+)\s+(?P<from>\S+)\s+rfc822
^^^ ^^
See the regex demo.
Basically, replace all \s
with \s+
and make the dst channel group optional by wrapping both the \s+
and the whole dst channel group with an optional non-capturing group.
Also, the from
group pattern should be replaced with \S+
(one or more chars other than whitespace) because you want to match an email, and .+
may - and usually it does - overmatch.
Upvotes: 1
Reputation: 8163
It worked if i changed the \w+
to a \w*
\d{2}\-\w{3}\-\d{4}\s\d{2}\:\d{2}\:\d{2}\.\d{2}\s(?P<src_channel>\w+)\s+(?P<dst_channel>\w*)\s(?P<code>\w+)\s(?P<bytes>\d+)\s(?P<from>\w.+)\srfc822
You can test it here
Upvotes: 1