Reputation: 435
I'm using DomCrawler to get data from a Google Play page and it works in 99% of cases, except I stumbled upon a page where it can not find a specific div. I check the HTML code and it is definitely there. My code is
$autoloader = require __DIR__.'\vendor\autoload.php';
use Symfony\Component\DomCrawler\Crawler;
$app_id = 'com.balintinfotech.sinhalesekeyboardfree';
$response = file_get_contents('https://play.google.com/store/apps/details?id='.$app_id);
$crawler = new Crawler($response);
echo $crawler->filter('div[itemprop="datePublished"]')->text();
When I run that specific page I get
PHP Fatal error: Uncaught InvalidArgumentException: The current node list is empty.
However, if I use any other ID, I get the desired result. What exactly is about that page that breaks DomCrawler
Upvotes: 1
Views: 1888
Reputation: 9957
As you correctly figured out, this doesn't happen in the English version, but it does in the Spanish one.
One difference I could spot was a comment by a user saying නියමයි ඈ
. There seems to be something bothering the Crawler there. If you replace a null
characted (\x00
) by an empty string, it correctly gets what you're looking for:
<?php
$app_id = 'com.balintinfotech.sinhalesekeyboardfree';
$response = file_get_contents('https://play.google.com/store/apps/details?hl=en&id='.$app_id);
$response = str_replace("\x00", "", $response);
$crawler = new Symfony\Component\DomCrawler\Crawler($response);
var_dump($crawler->filter('div[itemprop="datePublished"]')->text()); // string(14) "March 14, 2017"
I'll try to look more into this.
Upvotes: 1