CIM
CIM

Reputation: 125

Using urllib to open a url with an accent

I am trying to open a url using urlopen in urllib, but am getting an error due to an accent mark in the URL:

import urllib
import ssl
context = ssl._create_unverified_context()
url = 'https://en.wikipedia.org/wiki/Raúl_Grijalva'
page = urllib.request.urlopen(url, context=context)
UnicodeEncodeError: 'ascii' codec can't encode character '\xfa' in position 12: ordinal not in range(128)

I found this answer suggesting adding a u to the string and encoding, but this gives me a different error:

import urllib
import ssl
context = ssl._create_unverified_context()
url = u'https://en.wikipedia.org/wiki/Raúl_Grijalva'
page = urllib.request.urlopen(url.encode('UTF-8'), context=context)
AttributeError: 'bytes' object has no attribute 'timeout'

I did notice in that answer they use urllib.urlopen instead of urllib.request.urlopen and I'm not exactly sure what the difference between these is, but the former throws an error that urllib doesn't have that attribute.

How can I properly handle this character in the url?

Upvotes: 0

Views: 279

Answers (1)

Terry Spotts
Terry Spotts

Reputation: 4045

Using parse.quote() to escape the text with accent character seems to work:

from urllib import request, parse
import ssl

context = ssl._create_unverified_context()
url = 'https://en.wikipedia.org/'
path = parse.quote('wiki/Raúl_Grijalva')

page = request.urlopen(url + path, context=context)

Upvotes: 2

Related Questions