Reputation: 89
I am attempting to use a fairly complex REGEX expression (see REGEX101 demos below), which I amended slightly from one created by an expert on this site. It parses specific patterns of log events:
These log sequences will always begin with a random selection of EXE_IN
or EXE_CO
events, preceded sequence numbers. These selections can be any number, in any order. In this case, we just have two EXE
events but this may be 200. Or 1. Note that there is a sequence number and we need to capture it.
The second part of the sequence will always be a series of digit-prefaced CONTENT.ACCESS
events. Again from 1 to infinity in length.
The following demo shows a working example and probably conveys the concept better than I can : Demo 1
It nicely captures a full match, sequence number, and event in separate groups.
I need to add a timestamp to the pattern (after the sequence number, with a preceding underscore), and then parse this event log e.g.
I need to capture the timestamps as well.
I attempted to adjust the regex expression, with mixed results. Please see this demo: demo2
Ideally I'd like to see something like this for each event:
Match n
Full match 266-308 `2_12/08/2014 09:17CONTENT_ACCESS`
Group 1. 266-267 `2`
Group 2. 268-284 `12/08/2014 09:17`
Group 3. 284-308 `CONTENT_ACCESS`
I hope you can help me. REGEX101 pcre testing is sufficient (for the record, I am using perl-compatible str_match_all_perl function in R).
Many thanks in advance.
Upvotes: 0
Views: 76
Reputation: 1052
(\d+)_(.*?)(EXE_CO|EXE_IN|CONTENT_ACCESS)
https://regex101.com/r/EHHcKm/1
Due to comments it was changed to (?:\G(?!^)(?(?=\d+_\d{2}\/\d{2}\/\d{4}\s\d{2}\:\d{2}(?:EXE_CO|EXE_IN))(?<!\d_\d{2}\/\d{2}\/\d{4}\s\d{2}\:\d{2}CONTENT_ACCESS))|(?=(?:\d+_\d{2}\/\d{2}\/\d{4}\s\d{2}\:\d{2}(?:EXE_CO|EXE_IN))+(?:\d+_\d{2}\/\d{2}\/\d{4}\s\d{2}\:\d{2}CONTENT_ACCESS)+))(\d+)_(\d{2}\/\d{2}\/\d{4}\s\d{2}\:\d{2})(EXE_CO|EXE_IN|CONTENT_ACCESS)
https://regex101.com/r/EHHcKm/3
Ans also another version, which is shorter
(?:\G(?!^)(?(?=\d+_.{16}(?:EXE_CO|EXE_IN))(?<!\d_.{16}CONTENT_ACCESS))|(?=(?:\d+_.{16}(?:EXE_CO|EXE_IN))+(?:\d+_.{16}CONTENT_ACCESS)+))(\d+)_(.{16})(EXE_CO|EXE_IN|CONTENT_ACCESS)
https://regex101.com/r/EHHcKm/4
And even more shorter (?:\G(?!^)(?(?=\d+_.{16}E)(?<!S))|(?=(?:\d+_.{16}(?:EXE_CO|EXE_IN))+\d+_.{16}C))(\d+)_(.{16})(EXE_CO|EXE_IN|CONTENT_ACCESS)
https://regex101.com/r/EHHcKm/5
And super short (?:\G|(?=\d+_.{16}E.*CON))(\d+)_(.*?)(EXE_CO|EXE_IN|CONTENT_ACCESS)
https://regex101.com/r/EHHcKm/8
Upvotes: 1