Reputation: 521
I try to match root of domain name with regular expressions in JS. I have a problem when path hasn't www. in himself.
For example, i tried match from this string:
(http://web.archive.org/web/20080620033027/http://www.mrvc.indianrail.gov.in/overview.htm)
Thats regex what i try is presented below. I try him on regex101.com
/(?<=(\/\/(www\.)|\/\/)).+?(?=\/)/g
I expect the output array with names web.archive.org
and mrvc.indianrail.gov.in
but get web.archive.org
and www.mrvc.indianrail.gov.in
with www. in second case.
Upvotes: 6
Views: 752
Reputation: 634
First you have to understand how regex matches.
If you set or(|) group, it matches whole group for each one character.
For example, input is 123 122
and pattern is (123|12)
.
Second group(12) always matches to both of two words.
Because first and second character of both two words matches 12
group already at second character, there is no need to check third character.
I think your purpose is to apply 123
group first for whole word(123) and ignore 12
group because 123
group already matched.
I suggest not using look behind, and get first group($1) like following:
\/\/(?:www\.)?(.+?)\/
https://regex101.com/r/Ufxzeq/1
Upvotes: 0
Reputation: 12448
What about this regex:
(?<=https?:\/\/(?:www\.)?)(?!www\.).+?(?=\/)
it matches web.archive.org
and mrvc.indianrail.gov.in
without the www.
demo: https://regex101.com/r/5ZqK7n/3/
Differences with your initial regex:
s?
to support https:
URLs (remove it if not necessary)(?:www\.)?
can appear 0 to 1 time
After the lookbehind you add a negative lookahead (?!www\.)
to not match, to avoid that your .+?
matches the initial www.
Upvotes: 1