Reputation: 29
Here is my script:
from requests import get
x = get("https://stackoverflow.com/").json()
Here is the full error I get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\models.py", line 897, in json
return complexjson.loads(self.text, **kwargs)
File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\json\__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I expect it to get the HTML code of StackOverflow's home page in JSON format, but I get this error. How would I fix this?
Upvotes: 1
Views: 38479
Reputation: 31
Try this
resp= requests.get("https://stackoverflow.com/")
resp_dict = json.loads(resp.text)
print(resp_dict)
Upvotes: 2
Reputation: 2764
Doing a GET
request on a URL endpoint may return any type of data. The data being returned can be identified by the Content-type
header that's returned.
You can (and should) use the .json()
method only if the Content-type
is application/json
.
For convenience sake, let's see what curl gives (you can see what requests gives, by using hdrs = request.get("...").headers
)
$ curl -I https://stackoverflow.com
HTTP/2 200
cache-control: private
content-type: text/html; charset=utf-8
x-frame-options: SAMEORIGIN
x-request-guid: 7d871e80-a0b1-4f70-96d1-5022c8b08ada
strict-transport-security: max-age=15552000
feature-policy: microphone 'none'; speaker 'none'
content-security-policy: upgrade-insecure-requests; frame-ancestors 'self' https://stackexchange.com
accept-ranges: bytes
date: Thu, 19 Mar 2020 06:35:54 GMT
via: 1.1 varnish
x-served-by: cache-ams21083-AMS
x-cache: MISS
x-cache-hits: 0
x-timer: S1584599754.957920,VS0,VE88
vary: Fastly-SSL
x-dns-prefetch-control: off
set-cookie: prov=ffb66caf-2ce3-8b67-5cf0-d0ec734e9d3e; domain=.stackoverflow.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
content-length: 112411
You see that Content-type
is text/html
and thus the .json()
method is failing.
BeautifulSoup4
is a good Python Module to get started with.Upvotes: 2