Reputation: 91
I'm a little out of my depth here but believe I am now on the right track. I want to take user supplied url's and store them in a database so that the links can then be used on a user profile page.
Now the links I'm hoping the users will supply will be for social media site, facebook and the like. Whilst looking for a solution to safely storing user supplied url's I found this page http://electrokami.com/coding/use-php-to-format-and-validate-a-url-with-these-easy-functions/. The code works but seems to remove nearly everything. If I used "www.example.com/user.php?u=borris" it just returns example.com is valid.
Then I found out about regular expressions and found this line of code
/(?:https?:\/\/)?(?:www\.)?facebook\.com\/(?:(?:\w)*#!\/)?(?:pages\/)?(?:[\w\-]*\/)*([\w\-\.]*)/
from this site https://gist.github.com/marcgg/733592 and another stack overflow post Check if a string contains a url and get contents of url php.
I tried to merge the code together so that I get something that would validate the link for a facebook profile or page. I don't want to get profile info, pics etc but my code's not right either, so rather than getting deeper into stuff I don't fully understand yet I thought asking for help was best.
Below is the code I mashed together which gave me the error "Warning: preg_match_all() [function.preg-match-all]: Compilation failed: unmatched parentheses at offset 29... on line 9"
<?php
// get url to check from the page parameter 'url'
// or use default http://example.com
$text = isset($_GET['url'])
? $_GET['url']
: "http://www.vwrx-project.co.uk/user.php?u=borris";
$reg_exurl = "/(?:http|https|ftp|ftps)?:\/\/)?(?:www\.)?facebook\.com\/(?:(?:\w)*#!\/)?(?:pages\/)?(?:[\w\-]*\/)*([\w\-\.]*)/";
preg_match_all($reg_exurl, $text, $matches);
$usedPatterns = array();
$url = '';
foreach($matches[0] as $pattern){
if(!array_key_exists($pattern, $usedPatterns)){
$usedPatterns[$pattern] = true;
$url = $pattern;
}
}
?>
--------------------------------------------------------- Additional ------------------------------------------------------------ I took a fresh look at the answer Dave provided me with today and felt I could work with it, it makes more sense to me from a code perspective as I can follow the process etc.
I got a system I'm partly happy with. If I supply a link http://www.facebook.com/#!/lilbugga which is a typical link from facebook (when clicking on your username/profile pic from your wall) I can get the result http://www.facebook.com/lilbugga which shows as valid.
What it can't handle is the link from facebook that isn't in a vanity/seo friendly format such as https://www.facebook.com/profile.php?id=4. If I allow my code to accept ? and = then I suspect I'm leaving my website/database open to attack which I don't want.
Whats the best option now? This is the code I have
<?php
$dirty_url = "http://www.facebook.com/profile.php?id=4"; //user supplied link
//clean url leaving alphanumerics : / . only - required to remove facebook link format with /#!/
$clean_url = preg_replace('#[^a-z0-9:/.]#i', '', $dirty_url);
$parsed_url = parse_url($clean_url); //parse url to get brakedown of components
$safe_host = $parsed_url['host']; // safe host direct from parse_url
// str_replace to switch any // to a / inside the returned path - required due to preg_replace process above
echo $safe_path = str_replace("//", "/", ($parsed_url['path']));
if ($parsed_url['host'] == 'www.facebook.com') {
echo "<a href=\"http://$safe_host$safe_path\" alt=\"facebook\" target=\"_new\">Facebook</a>";
} else {
echo " :( invalid url";
}
?>
Upvotes: 1
Views: 2875
Reputation: 46841
I have taken some regex pattern from HERE
Get the matched groups.
(?:http|https|ftp|ftps(?:\/\/)?)?(?:www.|[-;:&=\+\$,\w]+@)([A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??((?:[-\+=&;%@.\w_]*)#?(?:[\w]*)?))
Input:
www.example.com/user.php?u=borris
http://www.vwrx-project.co.uk/user.php?u=borris
Output:
MATCH 1
1. [4-15] `example.com`
2. [15-33] `/user.php?u=borris`
3. [25-33] `u=borris`
MATCH 2
1. [45-63] `vwrx-project.co.uk`
2. [63-81] `/user.php?u=borris`
3. [73-81] `u=borris`
Upvotes: 0
Reputation: 64657
Not sure exactly what you are trying to accomplish, but it sounds like you could use parse_url
for this:
<?php
$parsed_url = parse_url($_GET['url']);
//assume it's "http://www.vwrx-project.co.uk/user.php?u=borris"
print_r($parsed_url);
/*
Array
(
[scheme] => http
[host] => www.vwrx-project.co.uk
[path] => /user.php
[query] => u=borris
)
*/
if ($parsed_url['host'] == 'www.facebook.com') {
//do stuff
}
?>
Upvotes: 1