Using urllib to open a url with an accent

Question

I am trying to open a url using urlopen in urllib, but am getting an error due to an accent mark in the URL:

import urllib
import ssl
context = ssl._create_unverified_context()
url = 'https://en.wikipedia.org/wiki/Raúl_Grijalva'
page = urllib.request.urlopen(url, context=context)

UnicodeEncodeError: 'ascii' codec can't encode character '\xfa' in position 12: ordinal not in range(128)

I found this answer suggesting adding a u to the string and encoding, but this gives me a different error:

import urllib
import ssl
context = ssl._create_unverified_context()
url = u'https://en.wikipedia.org/wiki/Raúl_Grijalva'
page = urllib.request.urlopen(url.encode('UTF-8'), context=context)

AttributeError: 'bytes' object has no attribute 'timeout'

I did notice in that answer they use urllib.urlopen instead of urllib.request.urlopen and I'm not exactly sure what the difference between these is, but the former throws an error that urllib doesn't have that attribute.

How can I properly handle this character in the url?

Terry Spotts · Accepted Answer

Using parse.quote() to escape the text with accent character seems to work:

from urllib import request, parse
import ssl

context = ssl._create_unverified_context()
url = 'https://en.wikipedia.org/'
path = parse.quote('wiki/Raúl_Grijalva')

page = request.urlopen(url + path, context=context)

Using urllib to open a url with an accent

Answers (1)

Related Questions