franco
franco

Reputation: 697

RegEx for failing subdomains

Basically, I would like to check a valid URL that does not have subdomain on it. I can't seem to figure out the correct regex for it.

Example of URLs that SHOULD match:

Example of URLs that SHOULD NOT match:

Upvotes: -4

Views: 56

Answers (2)

m1m1k
m1m1k

Reputation: 1445

Try this one:

tweaked a bit to handle your /PATH example better

^(?<OptionalEmail>.*@)?(?<OptionalProtocol>http[s]?:\/\/)?(?:(?<ThirdLevelSubDomain>[\w-]{2,63})\.){0,127}?(?<DomainWithTLD>(?<Domain>[\w-]{2,63})\.(?<TopLevelDomain>[\w-]{2,63}?)(?:\.(?<CountryCode>[a-z]{2}))?)(?:[\/](?<Path>\w+))*(?<QString>(?<QueryStringSeparatorOrExtraJunk>[?&,])(?<QueryStringParams>\w+=\w+))*$

Upvotes: 0

Emma
Emma

Reputation: 27743

Here, we would start with an expression which is bounded on the right with .com or .co.uk and others, if desired, then we would swipe to left to collect all non-dot chars, add an optional www and https, then we would add a start char ^ which would fail all subdomains:

^(https?:\/\/)?(www\.)?([^.]+)(\.com|\.co\.uk)(.+|)$

Other TLDs can be added to this capturing group:

(\.com|\.co\.uk|\.net|\.org|\.business|\.edu|\.careers|\.coffee|\.college)

And the expression can be modified to:

^(https?:\/\/)?(www\.)?([^.]+)(\.com|\.co\.uk|\.net|\.org|\.business|\.edu|\.careers|\.coffee|\.college)(.+|)$

Flexibility

I can't think of something to make the TLDs too flexible, since this is a validation expression. For instance, if we would simplify it to:

^(https?:\/\/)?(www\.)?([^.]+)(\.[a-z]+)(\.uk?)?[a-z?=\/]+$

it might work for the URLs listed in the question, but it would also pass:

example.example

which is invalid. We can only use this expression:

^(https?:\/\/)?(www\.)?([^.]+)(\.[a-z]+)(\.uk?)?[a-z?=\/]+$

if we would know that what we pass, it is already a URL.

NOT FUNCTIONAL DEMO

Demo

This snippet just shows that how the capturing groups work:

const regex = /^(https?:\/\/)?(www\.)?([^.]+)(\.com|\.co\.uk)(.+|)$/gm;
const str = `example.com
www.example.com
example.co.uk
example.com/page
example.com?key=value

test.example.com
sub.test.example.com`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

RegEx

If this expression wasn't desired, it can be modified/changed in regex101.com.

DEMO

Upvotes: 0

Related Questions