chopper draw lion4
chopper draw lion4

Reputation: 13497

What is the point of .decode()

>>> infile = urllib.request.urlopen("http://www.yahoo.com")

With decoding:

>>>infile.read(100).decode()

'<!DOCTYPE html>\n<html lang="en-US" class="dev-desktop uni-purple-border  bkt901 https  uni-dark-purp'

Without decoding:

>>>infile.read(100)

b'le" style="">\n<!-- m2 template  -->\n<head>\n    <meta http-equiv="Content-Type" content="text/html; c'

It appears the difference is the 'b before the output, which I assume means bytes. Besides that, the output is exactly the same though.

Upvotes: 0

Views: 38

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1123002

No, the output is not the same; one is a Unicode value, the other an undecoded bytes value.

For ASCII, that looks the same, but when you load any web page that uses characters outside the ASCII characterset, the difference will be much clearer.

Take UTF-8 encoded data, for example:

>>> '–'
'–'
>>> '–'.encode('utf8')
b'\xe2\x80\x93'

That's a simple U+2013 EN DASH character. The bytes representation shows the 3 bytes UTF-8 uses to encode the codepoint.

You really want to read up on Unicode vs. encoded data here, I recommend:

Upvotes: 3

Related Questions