Reputation: 697
Basically, I would like to check a valid URL that does not have subdomain on it. I can't seem to figure out the correct regex for it.
Example of URLs that SHOULD match:
Example of URLs that SHOULD NOT match:
Upvotes: -4
Views: 56
Reputation: 1445
Try this one:
tweaked a bit to handle your /PATH example better
^(?<OptionalEmail>.*@)?(?<OptionalProtocol>http[s]?:\/\/)?(?:(?<ThirdLevelSubDomain>[\w-]{2,63})\.){0,127}?(?<DomainWithTLD>(?<Domain>[\w-]{2,63})\.(?<TopLevelDomain>[\w-]{2,63}?)(?:\.(?<CountryCode>[a-z]{2}))?)(?:[\/](?<Path>\w+))*(?<QString>(?<QueryStringSeparatorOrExtraJunk>[?&,])(?<QueryStringParams>\w+=\w+))*$
Upvotes: 0
Reputation: 27743
Here, we would start with an expression which is bounded on the right with .com
or .co.uk
and others, if desired, then we would swipe to left to collect all non-dot chars, add an optional www
and https
, then we would add a start char ^
which would fail all subdomains:
^(https?:\/\/)?(www\.)?([^.]+)(\.com|\.co\.uk)(.+|)$
Other TLDs can be added to this capturing group:
(\.com|\.co\.uk|\.net|\.org|\.business|\.edu|\.careers|\.coffee|\.college)
And the expression can be modified to:
^(https?:\/\/)?(www\.)?([^.]+)(\.com|\.co\.uk|\.net|\.org|\.business|\.edu|\.careers|\.coffee|\.college)(.+|)$
I can't think of something to make the TLDs too flexible, since this is a validation expression. For instance, if we would simplify it to:
^(https?:\/\/)?(www\.)?([^.]+)(\.[a-z]+)(\.uk?)?[a-z?=\/]+$
it might work for the URLs listed in the question, but it would also pass:
example.example
which is invalid. We can only use this expression:
^(https?:\/\/)?(www\.)?([^.]+)(\.[a-z]+)(\.uk?)?[a-z?=\/]+$
if we would know that what we pass, it is already a URL.
This snippet just shows that how the capturing groups work:
const regex = /^(https?:\/\/)?(www\.)?([^.]+)(\.com|\.co\.uk)(.+|)$/gm;
const str = `example.com
www.example.com
example.co.uk
example.com/page
example.com?key=value
test.example.com
sub.test.example.com`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
jex.im visualizes regular expressions:
If this expression wasn't desired, it can be modified/changed in regex101.com.
Upvotes: 0