Jonas Hallmann
Jonas Hallmann

Reputation: 23

Simple HTML DOM cannot get file

I have no clue what the solution might be. I simply cannot get the html file of this Charizard, I don't get any response even though the link is correct. Bulbasaur is working fine, but I want this lovely Charizard...

include("simple_html_dom.php");
$html = file_get_html('https://bulbapedia.bulbagarden.net/wiki/Charizard_(Pok%C3%A9mon)');
$html2 = file_get_html('https://bulbapedia.bulbagarden.net/wiki/Bulbasaur_(Pok%C3%A9mon)');
echo $html;
echo $html2;

Does this page have any protection or is Charizard only harder to catch? I'd appreciate if you are able to help me with this.

Jonas :)

Upvotes: 2

Views: 1269

Answers (3)

Nima
Nima

Reputation: 3409

There are two problems here:

  1. Length of the content fetched from this URL exceeds MAX_FILE_SIZE (defined in simple_html_dom.php)
  2. The bug that was pointed out in the comments (https://github.com/sunra/php-simple-html-dom-parser/issues/37). This bug seems to be resolved in the forked repository that is maintained on github but it still exists in original version (which does not seem to be maintained anymore).

To solve the first problem, edit simple_html_dom.php and change define('MAX_FILE_SIZE', 600000); to use a bigger number.

As a workaround for the second problem, pass correct parameters to file_get_html, and by that I mean to pass 0 for $offset:

$html = file_get_html('https://bulbapedia.bulbagarden.net/wiki/Charizard_(Pok%C3%A9mon)',
false,
null,
0); // this last one is the offset

var_dump($html);

Alternatively you can use the forked version of the library.

Upvotes: 3

pguardiario
pguardiario

Reputation: 54984

I'm going to suggest an alternative library because II don't think you will get this with simple_html_dom:

include 'advanced_html_dom.php';
$html = file_get_html('https://bulbapedia.bulbagarden.net/wiki/Charizard_(Pok%C3%A9mon)');

echo $html->find('h1', 0)->text() . PHP_EOL;
echo $html->find('big a[title*="Pokédex number"]', 0)->text() . PHP_EOL;

This gives:

Charizard (Pokémon)
#006

Upvotes: 0

alex55132
alex55132

Reputation: 177

Since i haven't found the file_get_html() in the php docs, maybe you prefer using file_get_contents(url) instead.

Upvotes: -1

Related Questions