Reputation: 13
I have a question regarding Java regular expressions and capture groups. My goal is to parse a log file and extract relevant fields into QRadar. I am not exactly writing Java code however since QRadar uses Java regular expressions to parse the log file and since my question is a regular expression problem I am posting it here in hope of getting some pointers/solution to my problem.
Here goes my question -
I am trying to parse a log file that is a CEF (Common Event Format) formatted log file. Following are a couple of lines from the log file -
[blah, blah...] cs1=DataValue1 cs2=DataValue2
[blah, blah...] cs2=DataValue3 cs1=DataValue4
My goal is to extract the data values for the fields cs1
and cs2
from the above lines. So I am interested in capturing the values - DataValue1
, DataValue2
, DataValue3
and DataValue4
from the above lines
I came up with the following regular expressions for accomplishing the same -
RegEx for cs1 field - \scs1\=(.*?)\s\w+\=
RegEx for cs2 field - \scs2\=(.*?)\s\w+\=
Using the above regular expressions and capture group I am able to capture the data values. But only in certain cases. So if you look at the log entries above you will notice that the order of the fields cs1
and cs2
within the log entry is not fixed. So at times the cs1
field appears before cs2
(in the middle of log entry) and at other times the field cs1
appears at the end (is the last field) of the log entry. Similar behavior exists with cs2
field as well. Using my current regular expression only works when the field is not the last field.
E.g. for the 1st log entry line [blah, blah...] cs1=DataValue1 cs2=DataValue2
, my regular expressions correctly parse/extract the value of the cs1
field but they fail for the cs2
field since cs2
field is at the end of the line.
Similarly, for the 2nd log entry line [blah, blah...] cs2=DataValue3 cs1=DataValue4
, my regular expressions correctly parse/extract the value of cs2
field but they fail to extract the value for the cs1
field since cs1
field is at the end of the line.
My question is - What should my regular expression be so that it can parse/extract the data field value correctly irrespective of whether the field appears in the middle or at the end of the log file entry?
Any help is appreciated
Regards,
P.S.: In case anyone is interested I posted this question on the QRadar forum as well (https://www.ibm.com/developerworks/community/forums/html/topic?id=f48bc2dc-2ccb-42df-b543-dc0522491fad) but no luck yet with any responses...
Upvotes: 1
Views: 2092
Reputation: 174836
Just use a lookahead to capture the values of cs1
and cs2
fields, if you don't know the order of it's arrangement.
^(?=.*?\scs1=(\S+))(?=.*\scs2=(\S+))
Java regex would be,
^(?=.*?\\scs1=(\\S+))(?=.*\\scs2=(\\S+))
Group index 1 contains the value of cs1
and index 2 contains the value of cs2
Upvotes: 2