Reputation: 2944
I'm trying to request the following URL:
https://www.sainsburys.co.uk/shop/gb/groceries/shiraz/barossa-valley-estate-grenache-shiraz-mourv%C3%A8dre-75cl
Decoding it with urllib and printing it reveals it to be:
In [36]: print urllib.unquote(url)
https://www.sainsburys.co.uk/shop/gb/groceries/shiraz/barossa-valley-estate-grenache-shiraz-mourvèdre-75cl
i.e. an accented "e".
But it seems no matter what I request with import requests; requests.get(...)
then I get a 404.
What is the proper input to give to the get method?
Upvotes: 1
Views: 320
Reputation: 58
you should decode the url with 'latin-1' after passing it to urrlib unquote
>>>
>>> k = "https://www.sainsburys.co.uk/shop/gb/groceries/shiraz/barossa-valley-estate-grenache-shiraz-mourv%C3%A8dre-75cl"
>>> r = requests.get(urllib.unquote(k).decode("latin-1"))
>>> r.status_code
200
>>>
Upvotes: 1