Reputation: 4826
I have written a regex to match URLs for the purposes of doing an str_replace() on posts in a comment system and replacing naked links with active, clickable links.
This works quite well:
(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\??(([a-zA-Z0-9]*=[a-zA-Z0-9]*)&?)*\/?
matches URLs quite nicely, but it is failing on this line:
"I know that but your name is not on the list see... http://screencast.com/t/ccccccc"
It is matching the [see... http] part.
What's wrong?
Upvotes: -2
Views: 120
Reputation: 149050
The part of the pattern that matches the protocol (the http://
or https://
) is optional. Also, the part of the pattern which is intended to match query of the URL (the part of the URL after the ?
).
Correct these two issues and it should work:
(https?:\/\/)([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\??(&?([a-zA-Z0-9]*=[a-zA-Z0-9]*))*
But we can improve this a bit:
(https?://)[\da-z.-]+(\.[a-z0-9-]+)+(\:\d+)?)(/[\w.-]*)*(\?\S+)?
Of course, this is till an approximation. For a more detailed and complete pattern you should probably read In search of the perfect URL validation regex, where the author provides a number of patterns and shows their strengths and weaknesses.
Upvotes: 1