Zombie.js in node.js fails to scrape certain websites

Question

The simple script below returns a bunch of rubbish. It works for most websites, but not william hill:

var Browser = require("zombie");
var assert = require("assert");

// Load the page from localhost
browser = new Browser()
browser.visit("http://sports.williamhill.com/bet/en-gb/betting/y/5/et/Football.html", function () {
browser.wait(function(){
console.log(browser.html());
});
});

run with node

output:

S��J��ꪙRUݒ�kf�6��Efr2�Riz��^��0�X� ��{�^�a�yp��p��Ή��`��(��S]-��'N�8q��/��?�ݻ��u;�݇�ׯ�Eiٲ>��-��3�ۗG�Ee�,��mF��MI��Q�۲��ڊ�ZG��O�J�^S�C~g��JO�緹�Oݎ��P��ET�n;v��v��D�tvJn��J�8'��햷r�v:��m��J��Z�nh�]�� Z��.{Z��Ӳl�B'�.¶D�~$n�/��u"�z��Ni��"ǋ��\00_I\00\��S��O�E8{"�m;�h��,o��Q�y��;��a[��c��q�D�띊?��/|?:�;��Z!}��/�wے�h�<��%��A�K=-a��~'

(actual output is much longer)

Anyone know why this happens, and specifically why it happens on the only site i actually want to scrape???

Thanks

Hugh M Halford-Thompson · Accepted Answer

I have abandoned this method long ago, but in case anyone is interested I got a reply from one of the zombie.js devs.

https://github.com/assaf/zombie/issues/251#issuecomment-5969175

He says: "Zombie will now send accept-encoding header to indicate it does not support gzip."

Thank you all who looked into this.

Zombie.js in node.js fails to scrape certain websites

Answers (2)

Related Questions