Get text from url.jsonp with PHP

Question

I'm trying to get the plain text from this webpage: https://html2-f.scribdassets.com/55ssxtbbb45pk2eg/pages/319-42c28ee981.jsonp which upon inspection is a callback function that inserts HTML. I'm trying to scrape the page and reformat the text to be comprehensive and actually display the HTML instead of it being plain text.

PHP:

echo file_get_contents("https://html2-f.scribdassets.com/55ssxtbbb45pk2eg/pages/319-42c28ee981.jsonp");

The returning text is a complete mess

��X321-5db7e88872.jsonp�Y]n�6��E�ıH�;��E�@��b�PM��%�f#K�H��}�;�z��:�eG"e��:@�E��j��XޖdJ��$�&$~��>a�8#��p�ӥy��X��8�r��(#kZ��85�j�A�%��Ȇ�...

Whereas it should look like this:

"

 

13
...

Although I could manually copy/paste the text from the webpage into a text editor for future usage, I would like to eliminate this step as I'll need to do this for 320 pages.

Is there some work around for .jsonp urls? Or is the data encrypted by the server? (I just don't know)

ishegg · Accepted Answer

The response is gzip'd. You can see it in the response headers:

Content-Encoding: gzip

So, you need to unzip it. You can do this either changing your whole approach and using cURL, or using the stream wrapper compress.zlib://. Just prepend that to the URL:

echo file_get_contents("compress.zlib://https://html2-f.scribdassets.com/55ssxtbbb45pk2eg/pages/319-42c28ee981.jsonp");

That will get you the correct response. Notice that this is still a JSONP response, so it's in form of a callback. You need to decide what to do with it.

Get text from url.jsonp with PHP

Answers (1)

Related Questions