Reputation: 129
I’m trying to get the header from a website, encode it in JSON to write it to a file. I’ve tried two different ways without success.
FIRST with urllib2 and json
import urllib2
import json
host = ("https://www.python.org/")
header = urllib2.urlopen(host).info()
json_header = json.dumps(header)
print json_header
in this way I get the error:
TypeError: is not JSON serializable
So I try to bypass this issue by converting the object to a string -> json_header = str(header) In this way I can json_header = json.dumps(header) but the output it’s weird:
"Date: Wed, 02 Jul 2014 13:33:37 GMT\r\nServer: nginx\r\nContent-Type: text/html; charset=utf-8\r\nX-Frame-Options: SAMEORIGIN\r\nContent-Length: 45682\r\nAccept-Ranges: bytes\r\nVia: 1.1 varnish\r\nAge: 1263\r\nX-Served-By: cache-fra1220-FRA\r\nX-Cache: HIT\r\nX-Cache-Hits: 2\r\nVary: Cookie\r\nStrict-Transport-Security: max-age=63072000; includeSubDomains\r\nConnection: close\r\n"
SECOND with requests
import requests
r = requests.get(“https://www.python.org/”)
rh = r.headers
print rh
{'content-length': '45682', 'via': '1.1 varnish', 'x-cache': 'HIT', 'accept-ranges': 'bytes', 'strict-transport-security': 'max-age=63072000; includeSubDomains', 'vary': 'Cookie', 'server': 'nginx', 'x-served-by': 'cache-fra1226-FRA', 'x-cache-hits': '14', 'date': 'Wed, 02 Jul 2014 13:39:33 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'text/html; charset=utf-8', 'age': '1619'}
In this way the output is more JSON like but still not OK (see the ‘ ‘ instead of “ “ and other stuff like the = and ;). Evidently there’s something (or a lot) I’m not doing in the right way. I’ve tried to read the documentation of the modules but I can’t understand how to solve this problem. Thank you for your help.
Upvotes: 11
Views: 30520
Reputation: 9628
I know this is an old question, but I stumbled across it when trying to put together a quick and dirty Python curl
-esque URL getter. I kept getting an error:
TypeError: Object of type 'CaseInsensitiveDict' is not JSON serializable
The above solutions are good if need to output a JSON string immediately, but in my case I needed to return a python dictionary of the headers, and I wanted to normalize the capitalization to make all keys lowercase.
My solution was to use a dict comprehension:
import requests
response = requests.head('https://www.python.org/')
my_dict = {
'body': response.text,
'http_status_code': response.status_code,
'headers': {k.lower(): v for (k, v) in response.headers.items()}
}
Upvotes: 0
Reputation: 9657
If you are only interested in the header, make a head
request. convert the CaseInsensitiveDict
in a dict
object and then convert it to json
.
import requests
import json
r = requests.head('https://www.python.org/')
rh = dict(r.headers)
json.dumps(rh)
Upvotes: 10
Reputation: 21924
There are more than a couple ways to encode headers as JSON
, but my first thought would be to convert the headers
attribute to an actual dictionary instead of accessing it as requests.structures.CaseInsensitiveDict
import requests, json
r = requests.get("https://www.python.org/")
rh = json.dumps(r.headers.__dict__['_store'])
print rh
{'content-length': ('content-length', '45474'), 'via': ('via', '1.1 varnish'), 'x-cache': ('x-cache', 'HIT'), 'accept-ranges': ('accept-ranges', 'bytes'), 'strict-transport-security': ('strict-transport-security', 'max-age=63072000; includeSubDomains'), 'vary': ('vary', 'Cookie'), 'server': ('server', 'nginx'), 'x-served-by': ('x-served-by', 'cache-iad2132-IAD'), 'x-cache-hits': ('x-cache-hits', '1'), 'date': ('date', 'Wed, 02 Jul 2014 14:13:37 GMT'), 'x-frame-options': ('x-frame-options', 'SAMEORIGIN'), 'content-type': ('content-type', 'text/html; charset=utf-8'), 'age': ('age', '1483')}
Depending on exactly what you want on the headers you can specifically access them after this, but this will give you all the information contained in the headers, if in a slightly different format.
If you prefer a different format, you can also convert your headers to a dictionary:
import requests, json
r = requests.get("https://www.python.org/")
print json.dumps(dict(r.headers))
{"content-length": "45682", "via": "1.1 varnish", "x-cache": "HIT", "accept-ranges": "bytes", "strict-transport-security": "max-age=63072000; includeSubDomains", "vary": "Cookie", "server": "nginx", "x-served-by": "cache-at50-ATL", "x-cache-hits": "5", "date": "Wed, 02 Jul 2014 14:08:15 GMT", "x-frame-options": "SAMEORIGIN", "content-type": "text/html; charset=utf-8", "age": "951"}
Upvotes: 17
Reputation: 143002
import requests
import json
r = requests.get('https://www.python.org/')
rh = r.headers
print json.dumps( dict(rh) ) # use dict()
result:
{"content-length": "45682", "via": "1.1 varnish", "x-cache": "HIT", "accept-ranges": "bytes", "strict-transport-security": "max-age=63072000; includeSubDomains", "vary": "Cookie", "server": "nginx", "x-served-by": "cache-fra1224-FRA", "x-cache-hits": "5", "date": "Wed, 02 Jul 2014 14:08:04 GMT", "x-frame-options": "SAMEORIGIN", "content-type": "text/html; charset=utf-8", "age": "3329"}
Upvotes: 1