Reputation: 271
I am trying to extract the job name , region from Splunk source using regex .
Below is the format of my sample source :
/home/app/abc/logs/20200817/job_DAILY_HR_REPORT_44414_USA_log
With the below , I am able to extract job name :
(?<logdir>\/[\W\w]+\/[\W\w]+\/)(?<date>[^\/]+)\/job_(?<jobname>.+)_\d+
Here is the match so far :
Full match 0-53 /home/app/abc/logs/20200817/job_DAILY_HR_REPORT_44414
Group `logdir` 0-19 /home/app/abc/logs/
Group `date` 19-27 20200817
Group `jobname` 32-47 DAILY_HR_REPORT
I also need USA (region) from the source . Can you please help suggest. Region will always appear after number field (44414) , which can vary in number of digits. Ex: 123, 1234, 56789
Thank you in advance.
Upvotes: 0
Views: 1348
Reputation: 163287
You could make the pattern a bit more specific about what you would allow to match as [\W\w]+
and .+
will cause more backtracking to fit the rest of the pattern.
Then for the region you can add a named group at the end (?<region>[^\W_]+)
matching one or more times any word character except an underscore.
In parts
(?<logdir>\/(?:[^\/]+\/)*)(?<date>(?:19|20)\d{2}(?:0?[1-9]|1[012])(?:0[1-9]|[12]\d|3[01]))\/job_(?<jobname>\w+)_\d+_(?<region>[^\W_]+)_log
(?<logdir>
Group logdir
\/(?:[^\/]+\/)*
match /
and optionally repeat any char except /
followed by matching the /
again)
Close group(?<date>
Group date
(?:19|20)\d{2}
Match a year starting with 19 or 20(?:0?[1-9]|1[012])
Match a month(?:0[1-9]|[12]\d|3[01])
Match a day)
Close group\/job_
Match /job_
(?<jobname>\w+)
Group jobname, match 1+ word chars_\d+_
Match 1+ digits between underscores(?<region>[^\W_]+)
Group region Match 1+ occurrences of a word char except _
_log
Match literallyUpvotes: 2