Reputation: 91
The pattern that I have so far using regex
Pattern regex = Pattern.compile("^.*?\/\/([^:\/\s]+)(.*(?=\?|\#))", Pattern.DOTALL);
While working on the string https://url.spec.whatwg.org/#url-syntax, it successfully grabs just the / as I am trying to avoid ? and #, however the problem arises when I try https://url.spec.whatwg.org/
The whitespace at the end is preventing it from finding / in group 2. I have tried including \p{Blank} in the lookahead, however it did nothing.
"https://www.google.com/search?q=Regular+Expressions&num=1000"
Same for the string above; it grabs the /search before the ? but as soon as there as I try "https://www.google.com/search" it breaks down.
How can I fix this?
Thank you for your time!
Upvotes: 2
Views: 173
Reputation: 1695
The answer below assumes that the input will be URL and we'll take only a bit of it without the query string
. Try this
(http)s?:\/\/[^#?]+
You could change the (http)s?
with (.+)
if you want your old multi-catch approach.. although we could define protocols directly like (http|ftp|...)s?
.
Upvotes: 2