SomniusX
SomniusX

Reputation: 83

Php script that works in fetching og:image from url but fails on specific ones

Hello i'm trying to build a custom script in php that fetches the og:image property in an array and then printout the specific result. I've used the following code

<?php
$_URL = $_GET['url']; //getting the url from THE url value
function getSiteOG( $url, $specificTags=0 ){
    $doc = new DOMDocument();
    @$doc->loadHTML(file_get_contents($url));
    $res['title'] = $doc->getElementsByTagName('title')->item(0)->nodeValue;
    foreach ($doc->getElementsByTagName('meta') as $m){
        $tag = $m->getAttribute('name') ?: $m->getAttribute('property');
        if(in_array($tag,['description','keywords']) || strpos($tag,'og:')===0) $res[str_replace('og:','',$tag)] = $m->getAttribute('content');
    }
    return $specificTags? array_intersect_key( $res, array_flip($specificTags) ) : $res;
}
$_ARRAY = getSiteOG("$_URL");
echo $_ARRAY['image'];
?>

and when used with the following syntax e.g. on the our site

tags.php?url=http://www.stackoverflow.com

it prints out the following result

https://cdn.sstatic.net/Sites/stackoverflow/img/[email protected]?v=73d79a89bded

Which is acceptable.

The script is being run on a batch file using the following method

@echo off
PowerShell -Command "(new-object net.webclient).DownloadString('http://yoursite.com/tags.php?url=https://www.banggood.com/TKEXUN-M2-Flip-Phone-2800mAh-3_0-inch-Touch-Screen-Blutooth-FM-Dual-Sim-Card-Flip-Feature-Phone-p-1367504.html')"
PowerShell -Command "(new-object net.webclient).DownloadString('http://yoursite.com/tags.php?url=https://www.banggood.com/Xiaomi-Mi-9T-Pro-Global-Version-6_39-inch-48MP-Triple-Camera-NFC-4000mAh-6GB-64GB-Snapdragon-855-Octa-core-4G-Smartphone-p-1547570.html?ID=564486&cur_warehouse=HK')"
PowerShell -Command "(new-object net.webclient).DownloadString('http://yoursite.com/tags.php?url=https://www.banggood.com/OnePlus-7-6_41-Inch-FHD-AMOLED-Waterdrop-Display-60Hz-NFC-3700mAh-48MP-Rear-Camera-8GB-256GB-UFS-3_0-Snapdragon-855-Octa-Core-4G-Smartphone-p-1499559.html?ID=62208216150349&cur_warehouse=HK')"

That in return prints out on the screen the resulting links or when pipe'd on a file to a file, screenshot it also works with list of urls on a file on another batch script, but it doesn't matter now

The problem i'm experiencing is

When i try to fetch the og:image links of links like from gearbest website for example this one

https://www.gearbest.com/headsets/pp_009839056462.html

I get no results!!!

I've run simple commands like wget -qO- url or curl -I url for headers and the result is that it has something to do with how my php was compiled, or even curls, on the SSL side. I've read here that some sites need newer secure ssl etc.

To be noted i've also tried masquerading the wget request by changing user agent and other cookie related values on the fly, but still with no success.

I'm on a shared hosting with shell access on a jailed shell but with many binary tools, sed/awk/wget/curl etc and the host site is quite helpful in helping me resolve my problems by adding binaries i may need, but still i don't know how to proceed.

Any help is greatly appreciated

Upvotes: 1

Views: 376

Answers (1)

George Tasioulis
George Tasioulis

Reputation: 121

You're probably blocked due to your user-agent. I tried a curl to gearbest as well, and got a 403 permission denied error. Akamai seems to be blocking this user-agent.

But when I used something like curl -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (K HTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" URL it worked fine.

Upvotes: 2

Related Questions