CyberJunkie
CyberJunkie

Reputation: 22684

Validate URL by regex and filter_val

I have been searching for the best way to validate a URL in php and decided to use both regex and filter_val() I would like to share my code and get some feedback please.

function _valid_urls($str) {

        $regex = "/^(http):\/\/([A-Z0-9][A-Z0-9_-]*(?:\.[A-Z0-9][A-Z0-9_-]*)+):?(\d+)?\/?/i";

        if(!filter_var($str, FILTER_VALIDATE_URL) || (!preg_match($regex, $str))) //if invalid URL
        {               
            return FALSE;
        }
        else 
        {
            return TRUE;
        }
    } 

The code works but I'm not entirely sure if it's secure.

EDIT:

I found the most efficient regex for website URLs to be /^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \?=.-]*)*\/?$/

http://www.catswhocode.com/blog/10-regular-expressions-for-efficient-web-development

Upvotes: 0

Views: 1384

Answers (1)

King Skippus
King Skippus

Reputation: 3826

You've made a few errors in the regex. Nothing fatal, I don't think but nevertheless, just a few miscellaneous things you can do to clean it up. You have put parentheses around http, and they don't need to be there. It looks like you're not capturing it for use later. If you're trying to make the http:// part optional, you'll want to use (?:http:\/\/)? instead. Also, note that you need to escape backslashes in a string. Would this work just as well?

$regex = "/^".
  "(?:http:\\/\\/)?".  // Look for http://, but make it optional.
  "(?:[A-Z0-9][A-Z0-9_-]*(?:\\.[A-Z0-9][A-Z0-9_-]*))". // Server name
  "(?:\\d+)?".         // Optional port number
  "(?:\\/\\.*)?/i";    // Optional training forward slash and page info

There are probably better regexes out there for matching URLs. I'd suggest Googling regex url and having a look at them. Don't reinvent the wheel if you don't have to! Also note that the above doesn't allow for URLs without top-level domains, such as http://localhost/mypage.html. If you put a question mark before the double quotes on the "Server name" line, that should allow non-tld URLs.

This is a bit inefficient:

if(!filter_var($str, FILTER_VALIDATE_URL) || (!preg_match($regex, $str))) //if invalid URL
{               
    return FALSE;
}
else 
{
    return TRUE;
}

Your expression will yield a true/false value. How about just returning that, negating if needed?

return !(!filter_var($str, FILTER_VALIDATE_URL) || !preg_match($regex, $str));

Also, note that these expressions are equivalent:

!(!A || !B)
   A &&  B

So that could be simplified further to just:

return filter_var($str, FILTER_VALIDATE_URL) && preg_match($regex, $str);

Upvotes: 3

Related Questions