Reputation: 2214
Here's my regex :
\b(https?|www)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]*[.]{1,256}
I know I'm doing something wrong because I use RegEx very rarely.
The idea of the last [.]{1,256}
was to make sure of having at least one "." in.
So, without it I got "https://www" match, so I wanted to make sure that at least one dot exists.
But with the expression above, it cuts to the first dot, not the whole thing.
Upvotes: 1
Views: 1832
Reputation: 626871
First of all, www
before ://
does not make much sense, it can occur after ://
, so it can be removed.
Both [-a-zA-Z0-9+&@#/%?=~_|!:,.;]*
and [-a-zA-Z0-9+&@#/%=~_|]*
can match an empty string, and the [.]{1,256}
at the end of your pattern matches 1 to 256 dots, that is why you get matches up to a dot.
You may refactor the pattern to match all those chars you allow before a dot, then match a dot, and then match any amount of chars you allow, together with a dot:
\bhttps?://[-a-zA-Z0-9+&@#/%?=~_|!:,;]*\.[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*
Here,
[-a-zA-Z0-9+&@#/%?=~_|!:,;]*
- matches 0 or more chars you allow but a dot\.
- this matches a dot[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*
- 0 or more allowed chars including a dot.So, at least 1 dot will get matched.
Upvotes: 2