Reputation: 1787
I'm trying to extract specific portions of a url string. A simplified example is looking for any string in a url that starts with "who" or "what", has a total length of either 5 or 10 characters and stops matching on any non-alpha numeric string
for example:
http://www.test.com/who12/foo
-> who12
//5 char match starting with who and ending at the /
http://www.test.com/who1234567/foo
-> who1234567
//10 char match starting with who and ending at the /
http://www.test.com/what1
-> what1
//5 char match at the end of the string
http://www.test.com/what1?param=true
-> what1
//5 char match breaking on the ?
I've tried setting something up here
It breaks on the / in the 5 and 10 char scenarios but fails on the ?
case and the case where the match is at the end of the string.
Is there a simpler approach to accomplishing this?
Upvotes: 1
Views: 189
Reputation: 626950
I suggest using
\.com\/\K(?:who[^\/?\s]{2}|what[^\/?\s])(?:[^\/?\s]{5})?
See this regex demo.
Use a capturing approach if PCRE \K
match reset operator is not supported:
\.com\/((?:who[^\/?\s]{2}|what[^\/?\s])(?:[^\/?\s]{5})?)
See this regex demo
Details:
\.com\/
- match .com/
so as to find the necessary left hand side context for the text you need(?:who[^\/?\s]{2}|what[^\/?\s])(?:[^\/?\s]{5})?
- two alternatives and optional 5 chars after either of them:
who[^\/?\s]{2}
- who
followed with 2 chars other than /
, ?
and whitespace|
- orwhat[^\/?\s]
- what
followed with 1 char other than /
, ?
and whitespace, and then...(?:[^\/?\s]{5})?
- optional 5 chars other than /
, ?
and whitespace.Upvotes: 1
Reputation: 2748
Try with following regex.
Regex: (?=.{5,10})(?:who|what)(?:[^\/?\s]*)
Explanation:
(?=.{5,10})
lookahead checks for length of string to be 5 to 10 characters.
(?:who|what)
matches literals who
or what
.
[^\/?\s]*
is negated-character class for /
,?
,\s (whitespace)
. Hence other character than these will be matched.
Upvotes: 0