Reputation: 2421
I use next regex (updated version of linkify regex) to match links and do not match emails.
(\s*|[^a-zA-Z0-9.\+_\/"\>\-]|^)(?:([a-zA-Z0-9\+_\-]+(?:\.[a-zA-Z0-9\+_\-]+)*@)?(http:\/\/|https:\/\/|ftp:\/\/|scp:\/\/){1}?((?:(?:[a-zA-Z0-9][a-zA-Z0-9_%\-_+]*\.)+))(?:[a-zA-Z]{2,})((?::\d{1,5}))?((?:[\/|\?](?:[\-a-zA-Z0-9_%#*&+=~!?,;:.\/]*)*)[\-\/a-zA-Z0-9_%#*&+=~]|\/?)?)([^a-zA-Z0-9\+_\/"\<\-]|$)
However this regex does not find urls like: https://someurl:3333/view/something
Can you please help me with this? Thanks!
Upvotes: 0
Views: 153
Reputation: 20486
This should be the "least modified" version of your expression to match domains without top-levels:
(\s*|[^a-zA-Z0-9.\+_\/"\>\-]|^)(?:([a-zA-Z0-9\+_\-]+(?:\.[a-zA-Z0-9\+_\-]+)*@)?(http:\/\/|https:\/\/|ftp:\/\/|scp:\/\/){1}?((?:[a-zA-Z0-9][a-zA-Z0-9_%\-_+.]*)(?:\.[a-zA-Z]{2,})?)((?::\d{1,5}))?((?:[\/|\?](?:[\-a-zA-Z0-9_%#*&+=~!?,;:.\/]*)*)[\-\/a-zA-Z0-9_%#*&+=~]|\/?)?)([^a-zA-Z0-9\+_\/"\<\-]|$)
The part that change was capture group 3, the one that grabbed the domain. It went from:
(
(?:
(?:
[a-zA-Z0-9]
[a-zA-Z0-9_%\-_+]*
\.
)+ (?# this is how they repeated for optional subdomains)
)
)
(?:
[a-zA-Z]{2,} (?# here is the mandatory TLD)
)
To this:
(
(?:
[a-zA-Z0-9]
[a-zA-Z0-9_%\-_+.]* (?# the . is in the character class here for subdomains)
)
(?:
\.
[a-zA-Z]{2,}
)? (?# this TLD is optional)
)
Upvotes: 1