Konstantin Rusanov
Konstantin Rusanov

Reputation: 6554

Python Request binary content

I'm trying to get JSON from a Google Trends URL, but I can't convert it to JSON because content goes as b''. How I can get this result as JSON?

My simple code:

import requests
r = requests.get('https://trends.google.ru/trends/api/stories/latest?hl=ru&tz=-180&cat=all&fi=15&fs=15&geo=RU&ri=300&rs=15&sort=0')
print(r.content)

r.content starts with:

b')]}\'\n{"featuredStoryIds":[],"trendingStoryIds":["RU_lnk_iJ8H1AAwAACP-M_ru","RU_lnk_7H7L0wAwAAAnHM_ru","RU_lnk_Q-IB1AAwAABChM_ru","RU_lnk_EErj0wAwAADzKM_ru","RU_lnk_VY2s0wAwAAD57M_ru","RU_lnk_sdUP1AAwAAC-sM_ru","RU_lnk_ILv60wAwAADa2M_ru","RU_lnk_O6j70wAwAADAyM_ru","RU_lnk_fVQS1AAwAABvMM_ru","RU_lnk_TJ8D1AAwAABP-M_ru","RU_lnk_I97F0wAwAADmvM_ru","RU_lnk_tCrq0wAwAABeSM_ru","RU_lnk_W8EA1AAwAABbpM_ru","RU_lnk_IYX90wAwAADc5M_ru","RU_lnk_bz4M1AAwAABjWM_ru","RU_lnk_EJ-...

Decoding this with the r.json() method fails:

simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Upvotes: 0

Views: 4148

Answers (2)

Lena Weber
Lena Weber

Reputation: 272

Maybe try this it it might help:

 import requests
    r = requests.get('https://trends.google.ru/trends/api/stories/latest?hl=ru&tz=-180&cat=all&fi=15&fs=15&geo=RU&ri=300&rs=15&sort=0')
    page=r.status_code
    print page

Upvotes: -2

Martijn Pieters
Martijn Pieters

Reputation: 1122392

You are contacting a Google service, and Google is prefixing JSON with some extra data to prevent JSON hijacking:

>>> import requests
>>> r = requests.get('https://trends.google.ru/trends/api/stories/latest?hl=ru&tz=-180&cat=all&fi=15&fs=15&geo=RU&ri=300&rs=15&sort=0')
>>> r.content[:10]
b')]}\'\n{"fea'

Note the )]}' and newline characters at the start.

You need to remove this extra data first and manually decode; there are no other newlines in the payload so we can just split on the newline:

import json

json_body = r.text.splitlines()[-1]
json_data = json.loads(json_body)

I used Response.text here to get decoded string data (the server sets the correct content type encoding in the headers).

This gives you a decoded dictionary:

>>> json_body = r.text.splitlines()[-1]
>>> json_data = json.loads(json_body)
>>> type(json_data)
<class 'dict'>
>>> sorted(json_data)
['date', 'featuredStoryIds', 'hideAllImages', 'storySummaries', 'trendingStoryIds']

Upvotes: 3

Related Questions