Reputation: 113
I'm using python 3.3.0 in Windows 7.
I made this script to bypass http proxy without authentication
on a system. But when I execute, it gives the error:UnicodeEncodeError: 'charmap' codec can't encode characters in position 6242-6243: character maps to <undefined>
It seems that it fails to decode unicode characters into a string.
So, what should I use or edit/do? Do anybody have any clue or solution?
my .py
contains following:
import sys, urllib
import urllib.request
url = "http://www.python.org"
proxies = {'http': 'http://199.91.174.6:3128/'}
opener = urllib.request.FancyURLopener(proxies)
try:
f = urllib.request.urlopen(url)
except urllib.error.HTTPError as e:
print ("[!] The connection could not be established.")
print ("[!] Error code: ", e.code)
sys.exit(1)
except urllib.error.URLError as e:
print ("[!] The connection could not be established.")
print ("[!] Reason: ", e.reason)
sys.exit(1)
source = f.read()
if "iso-8859-1" in str(source):
source = source.decode('iso-8859-1')
else:
source = source.decode('utf-8')
print("\n SOURCE:\n",source)
Upvotes: 0
Views: 2559
Reputation: 2713
if "iso-8859-1" in str(source):
. The call to str()
decodes the bytes data using your systems default encoding (sys.getdefaultencoding()
). If you really want to keep this check (see point 2) you should do
if b"iso-8859-1" in source:
This works on bytes instead of strings so no decoding has to be done beforehand.Note: This code works fine for me, presumably because my system uses a default encoding of utf-8 while your windows system uses something different.
Update: I recommend using python-requests when doing http in python.
import requests
proxies = {'http': your_proxy_here}
with requests.Session(proxies=proxies) as sess:
r = sess.get('http://httpbin.org/ip')
print(r.apparent_encoding)
print(r.text)
# more requests
Note: this doesn't use the encoding specified in the HTML, you would need a HTML parser like beautifulsoup to extract that.
Upvotes: 2