Reputation: 523
I'm trying to find best way to capture links listed under response headers, exactly like this one and I'm using python requests module. Below is link which has Link Headers section on Python Requests page: docs.python-requests.org/en/latest/user/advanced/
But, in my case my response headers contains links like below:
{'content-length': '12276', 'via': '1.1 varnish-v4', 'links': '<http://justblahblahblah.com/link8.html>;rel="last">,<http://justblahblahblah.com/link2.html>;rel="next">', 'vary': 'Accept-Encoding, Origin'}
Please notice > after "last" which is not the case under Requests examples and I just cant seem to figure out how to solve this.
Upvotes: 40
Views: 20218
Reputation: 2018
There is already a way provided by requests
to access links header
response.links
It returns the dictionary of links header value which can easily parsed further using
response.links['next']['url']
to get the required values.
Upvotes: 111
Reputation: 25349
You can parse the header's value manually. To make things easier you might want to use request's parsing function parse_header_links
as a reference.
Or you can do some find/replace and use original parse_header_links
In [1]: import requests
In [2]: d = {'content-length': '12276', 'via': '1.1 varnish-v4', 'links': '<http://justblahblahblah.com/link8.html>;rel="last">,<http://justblahblahblah.com/link2.html>;rel="next">', 'vary': 'Accept-Encoding, Origin'}
In [3]: requests.utils.parse_header_links(d['links'].rstrip('>').replace('>,<', ',<'))
Out[3]:
[{'rel': 'last', 'url': 'http://justblahblahblah.com/link8.html'},
{'rel': 'next', 'url': 'http://justblahblahblah.com/link2.html'}]
If there might be a space or two between >,
and <
then you need to do replace with a regular expression.
Upvotes: 16