Reputation: 125
I am trying to open a url using urlopen in urllib, but am getting an error due to an accent mark in the URL:
import urllib
import ssl
context = ssl._create_unverified_context()
url = 'https://en.wikipedia.org/wiki/Raúl_Grijalva'
page = urllib.request.urlopen(url, context=context)
UnicodeEncodeError: 'ascii' codec can't encode character '\xfa' in position 12: ordinal not in range(128)
I found this answer suggesting adding a u to the string and encoding, but this gives me a different error:
import urllib
import ssl
context = ssl._create_unverified_context()
url = u'https://en.wikipedia.org/wiki/Raúl_Grijalva'
page = urllib.request.urlopen(url.encode('UTF-8'), context=context)
AttributeError: 'bytes' object has no attribute 'timeout'
I did notice in that answer they use urllib.urlopen
instead of urllib.request.urlopen
and I'm not exactly sure what the difference between these is, but the former throws an error that urllib doesn't have that attribute.
How can I properly handle this character in the url?
Upvotes: 0
Views: 279
Reputation: 4045
Using parse.quote()
to escape the text with accent character seems to work:
from urllib import request, parse
import ssl
context = ssl._create_unverified_context()
url = 'https://en.wikipedia.org/'
path = parse.quote('wiki/Raúl_Grijalva')
page = request.urlopen(url + path, context=context)
Upvotes: 2