CODEWITHSUNDEEP

regexurlreplacehyperlinktext-parsing

DrDamnit

Reputation: 4826

Regex is matching a non-url

I have written a regex to match URLs for the purposes of doing an str_replace() on posts in a comment system and replacing naked links with active, clickable links.

This works quite well:

(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\??(([a-zA-Z0-9]*=[a-zA-Z0-9]*)&?)*\/?

matches URLs quite nicely, but it is failing on this line:

"I know that but your name is not on the list see... http://screencast.com/t/ccccccc"

It is matching the [see... http] part.

What's wrong?

Upvotes: -2

Views: 120

Answers (2)

p.s.w.g

Reputation: 149050

The part of the pattern that matches the protocol (the http:// or https://) is optional. Also, the part of the pattern which is intended to match query of the URL (the part of the URL after the ?).

Correct these two issues and it should work:

(https?:\/\/)([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\??(&?([a-zA-Z0-9]*=[a-zA-Z0-9]*))*

Demonstration

But we can improve this a bit:

(https?://)[\da-z.-]+(\.[a-z0-9-]+)+(\:\d+)?)(/[\w.-]*)*(\?\S+)?

Demonstration

Of course, this is till an approximation. For a more detailed and complete pattern you should probably read In search of the perfect URL validation regex, where the author provides a number of patterns and shows their strengths and weaknesses.

Upvotes: 1

revo

Reputation: 48741

(https?:\/\/)([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\??(([a-zA-Z0-9]*=[a-zA-Z0-9]*)&?)*\/?

Upvotes: 0

Related Questions