Remco Bravenboer
Remco Bravenboer

Reputation: 133

What am I doing wrong in this regex?

I am having a school assignment about Regex. I will explain it first.

I have to write a regex for checking URLs, the conditions I have to check are:

Here is the regex I currently have:

(https?|ftps?):\/\/(www\.)?[a-z]+\.[a-z]+\.(nl|edu)$

My URL is:

http://www.lib.hva.nl

The URL currently passes the regex, but when I remove .lib or .hva for example it still passes and that should not happen. When there's www. in the domain the domain should have four levels. Could someone help me out with this issue?

Upvotes: 2

Views: 170

Answers (3)

Intenso17
Intenso17

Reputation: 1

You can also use {n} for exactly n occurences which might be more readable sometimes. You can easly increase subdomains amount.

(https?|ftps?):\/\/(www\.)?+([a-z]+\.){2}(nl|edu)$

Upvotes: 0

Nahuel Fouilleul
Nahuel Fouilleul

Reputation: 19315

this can be resolve using possessive quantifier + after (www\.)?

(https?|ftps?):\/\/(www\.)?+[a-z]+\.[a-z]+\.(nl|edu)$

explanation

(https?|ftps?):\/\/(www\.)?[a-z]+\.[a-z]+\.(nl|edu)$

matches

http://www.lib.nl

because after failing engine backtrack until (www\.)? and as [a-z]+. matches also www. the match succeeds, to avoid backtracking (www\.)?, possesive quantifier can be used.

other options can be to use a negative lookahead or an atomic group (as in the regex101 link).

can be checked on regex101

Upvotes: 10

Josh Withee
Josh Withee

Reputation: 11336

The issue is that [a-z]+ also matches www. In order to prevent this, use a negative look-ahead assertion before your first instance of [a-z]+, like this:

(https?|ftps?):\/\/(www\.)?(?!www\.)[a-z]+\.[a-z]+\.(nl|edu)$

Upvotes: 2

Related Questions