Reputation: 83
Hello i'm trying to build a custom script in php that fetches the og:image property in an array and then printout the specific result. I've used the following code
<?php
$_URL = $_GET['url']; //getting the url from THE url value
function getSiteOG( $url, $specificTags=0 ){
$doc = new DOMDocument();
@$doc->loadHTML(file_get_contents($url));
$res['title'] = $doc->getElementsByTagName('title')->item(0)->nodeValue;
foreach ($doc->getElementsByTagName('meta') as $m){
$tag = $m->getAttribute('name') ?: $m->getAttribute('property');
if(in_array($tag,['description','keywords']) || strpos($tag,'og:')===0) $res[str_replace('og:','',$tag)] = $m->getAttribute('content');
}
return $specificTags? array_intersect_key( $res, array_flip($specificTags) ) : $res;
}
$_ARRAY = getSiteOG("$_URL");
echo $_ARRAY['image'];
?>
and when used with the following syntax e.g. on the our site
tags.php?url=http://www.stackoverflow.com
it prints out the following result
https://cdn.sstatic.net/Sites/stackoverflow/img/[email protected]?v=73d79a89bded
Which is acceptable.
The script is being run on a batch file using the following method
@echo off
PowerShell -Command "(new-object net.webclient).DownloadString('http://yoursite.com/tags.php?url=https://www.banggood.com/TKEXUN-M2-Flip-Phone-2800mAh-3_0-inch-Touch-Screen-Blutooth-FM-Dual-Sim-Card-Flip-Feature-Phone-p-1367504.html')"
PowerShell -Command "(new-object net.webclient).DownloadString('http://yoursite.com/tags.php?url=https://www.banggood.com/Xiaomi-Mi-9T-Pro-Global-Version-6_39-inch-48MP-Triple-Camera-NFC-4000mAh-6GB-64GB-Snapdragon-855-Octa-core-4G-Smartphone-p-1547570.html?ID=564486&cur_warehouse=HK')"
PowerShell -Command "(new-object net.webclient).DownloadString('http://yoursite.com/tags.php?url=https://www.banggood.com/OnePlus-7-6_41-Inch-FHD-AMOLED-Waterdrop-Display-60Hz-NFC-3700mAh-48MP-Rear-Camera-8GB-256GB-UFS-3_0-Snapdragon-855-Octa-Core-4G-Smartphone-p-1499559.html?ID=62208216150349&cur_warehouse=HK')"
That in return prints out on the screen the resulting links or when pipe'd on a file to a file,
it also works with list of urls on a file on another batch script, but it doesn't matter now
The problem i'm experiencing is
When i try to fetch the og:image links of links like from gearbest website for example this one
https://www.gearbest.com/headsets/pp_009839056462.html
I get no results!!!
I've run simple commands like wget -qO- url
or curl -I url
for headers and the result is that it has something to do with how my php was compiled, or even curls, on the SSL side.
I've read here that some sites need newer secure ssl etc.
To be noted i've also tried masquerading the wget request by changing user agent and other cookie related values on the fly, but still with no success.
I'm on a shared hosting with shell access on a jailed shell but with many binary tools, sed/awk/wget/curl etc and the host site is quite helpful in helping me resolve my problems by adding binaries i may need, but still i don't know how to proceed.
Any help is greatly appreciated
Upvotes: 1
Views: 376
Reputation: 121
You're probably blocked due to your user-agent. I tried a curl to gearbest as well, and got a 403 permission denied error. Akamai seems to be blocking this user-agent.
But when I used something like curl -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (K HTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" URL
it worked fine.
Upvotes: 2