Reputation: 51
The regular expression posted below is used to pick up URLs, including ones in the format such as example.com
. However, I want it only to pick up on URLs that have a www.
or http
, https
, etc. in the front. In other words, it should pick up www.example.com
. It should not pick up example.com
.
((((ht|f)tp(s?))\://)?((www.|[a-zA-Z])([a-zA-Z0-9\-]+\.)([a-zA-Z]{2,8}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\;\?\'\\\+&%\$#\=~_\-]+))*)
Upvotes: 2
Views: 1410
Reputation: 154543
Here you go:
\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.])(?:[^\s()<>]+|\([^\s()<>]+\))+(?:\([^\s()<>]+\)|[^`!()\[\]{};:'".,<>?«»“”‘’\s]))
It's the revised Liberal URL Regex from Daring Fireball.
Upvotes: 0
Reputation: 27486
Hmmm try
(((((ht|f)tp(s?))\://)|(www\.))((|[a-zA-Z])([a-zA-Z0-9-]+.)([a-zA-Z]{2,8}))(\:[0-9]+)*(/($|[a-zA-Z0-9.\,\;\?\'\+&%\$#\=~_-]+))*)
EDIT: Yeah, I didn't really test that one. Ok, I didn't test this one either but I looked at it REALLY carefully ;)
(((((ht|f)tp(s?))\://)|(www\.))(([a-zA-Z0-9-]+.)?([a-zA-Z0-9]+\.)([a-zA-Z]{2,8}))(\:[0-9]+)*(/($|[a-zA-Z0-9.\,\;\?\'\+&%\$#\=~_-]+))*)
You should look into a good regex tester. I usually use Expresso but there are many others out there.
Upvotes: 1
Reputation: 107989
Validate that the URI is well-formed with a regexp--use the one out of RFC 3986. Validate that it is plausible with code. Trying to combine the check for well-formed and plausible into one regexp is too difficult to get right. See: Need a regex to validating a Url...
Upvotes: 1
Reputation: 2836
I modified your expression:
((((ht|f)tp(s?))\://)?((www\.)([a-zA-Z0-9-]+\.)([a-zA-Z]{2,8}))(\:[0-9]+)*(/($|[a-zA-Z0-9.\,\;\?\'\+&%\$#\=~_-]+))*)
A pretty good website to check your expressions here: http://gskinner.com/RegExr/
Upvotes: 0