Reputation: 124
I'm working on a snippet and I needed to validate URLs so I know that I'm sending data to the correct URL, for this I am using filter_var()
function.
I started encountering issues with this when I started testing, this is my code;
<?php
function post($webLink){
$url = filter_var($webLink, FILTER_SANITIZE_URL);
if (filter_var($url, FILTER_VALIDATE_URL)) {
echo 'Correct';
}
else {
echo 'Please check your url.';
}
}
post('h://www.google.com');
?>
A lot of invalid links validated as correct urls including the current one.
Links that got validated are;
ht1tp://www.google.com
h://ww.google.com
http://www.google.
http://www.google.343
I refuse to believe that it is the function validating these links as correct, I'd like to think that something is wrong in my if (filter_var($url, FILTER_VALIDATE_URL))
line.
I need clarification on how to properly use this please. Thanks
Upvotes: 4
Views: 2372
Reputation: 1269
First, only validate input. Never sanitize input. Do not sanitize until it is ready to become output. This is a general rule of handling data across the board, and is just as important for displaying URLs securely as it is for preventing XSS attacks, SQL injections, and the like.
Second, the FILTER_VALIDATE_URL validates URLs based on RFC 2396. That RFC does not specify any specific scheme, though it does give several examples (i.e., HTTP:, GOPHER:, MAILTO:, etc.). The PHP manual on the validate filters explicitly states:
Beware a valid URL may not specify the HTTP protocol http:// so further validation may be required to determine the URL uses an expected protocol, e.g. ssh:// or mailto:.
Also, the RFC does not define the structure of domain names, or expect any specific top level domains. Thus, the validate filter does not check those. The domain names are formally assigned by registrars following ICANN rules, but you are free to configure your own local DNS server to create any entries that you want, including create TLD-only entries, thus any domain name is valid, whether it passes the validation filter or not.
The most secure way to validate some well defined data is to whitelist it. If you really want to make sure that nobody is passing you "ht tp:com.google.xssHackHere" then you will need to do further checking on your own. Be aware that there are now several hundred valid TLDs, and not all of them are easily expressed in ASCII characters, if you want to validate domain names as well as the scheme.
Upvotes: 9