Reputation: 1819
Say I have the following urls:
https://test.com/welcome/
https://sub.test.com/home/edit
https://test.com/home/view?view=column
https://test.com/home/view/?view=list
I would like to capture the following result:
welcome
edit
view
view
Right now I have (?:\/[^\/]+)+?\/(.*?)/{0,1}$
, (?:\/[^\/]+)+?(?:.*\/)(.*?)\?{0,1}$
, and (?:\/[^\/]+)+?(?:.*\/)(.*)/\?.*$
but they are complicated and I can't seem to combine them.
Upvotes: 2
Views: 809
Reputation: 33435
Go simple - regular expressions are all well and good, but split()
is much easier (and, very often, much faster):
index=ndx sourcetype=srctp url=*
| eval url=split(URL,"/")
| eval lastpart=mvindex(url,-1)
This splits the field url
into a multivalue field using the forward slash ('/
') as the delimiter
Then select the last entry using mvindex
and the index of -1
, which is always the last entry
Upvotes: 1
Reputation: 7
| makeresults
| eval _raw="https://test.com/welcome/
https://sub.test.com/home/edit
https://test.com/home/view?view=column
https://test.com/home/view/?view=list"
| makemv delim="
" _raw
| stats count by _raw
| rex "^.*\/(?<result>\w+)"
greedy matching is fine.
\w
is [a-zA-Z0-9_]
Upvotes: 1
Reputation: 110675
You can use the plain-vanilla regex:
(?<=[\/])[^\/?=]+(?=\/?$|\/?\?)
The regex can be written in free-spacing mode1 to make it self-documenting:
/
(?<=[\/]) # match '/' or '?' in positive lookbehind
[^\/?=]+ # match 1+ chars other than '/', '?' and '='
(?= # begin a positive lookahead
\/?$ # optionally map '/' then match end of line
| # or
\/?\? # optionally match '/' then match '?'
) # end positive lookahead
/x # free-spacing mode
1. I don't know if Splunk supports free-spacing mode but that is of no matter as I am using it merely to show how the regex works.
Upvotes: 1
Reputation: 626802
In Splunk, you may use a regex to match all text till the last occurrence of /
followed with any 1+ chars other than /
, ?
or #
and these 1+ chars can be captured with a named capturing group:
".*/(?<lasturlpart>[^/?#]+)"
See the regex demo. Note the \n
or (?:/?(?:[#?].*|$))
in my top comment are used in the demo to make sure the match does not overflow across lines since the input is a single multiline string in the demo, while you will be using the regex against standalone strings.
Pattern details
.*
- any 0 or more chars other than line break chars, as many as possible/
- a /
char (?<lasturlpart>[^/?#]+)
- Named capturing group matching 1 or more chars other than /
, ?
and #
.Upvotes: 2