Dzhao
Dzhao

Reputation: 693

Why does read() output a byte and not a string?

I am trying to analyze some binary files and assumed that Python's read() function returned a string from this article and a Tutorials Point article.

Yet when I messed around read() myself I got a something other than what I read.

>>> with gzip.open('RTLog_20150424T194428.gz') as f:
       a = f.read(3)
       print(a)
       type(a)


b'use'
<class 'bytes'>
>>> a
b'use'
>>> str(a)
"b'use'"
>>> b = 'asdfasdfasdf'
>>> type(b)
<class 'str'>
>>> 

When tested on my own, the output of a read() call returned a <class 'bytes'> object, not a <class 'str'> object.

What am I not getting?

Upvotes: 6

Views: 14130

Answers (3)

HYUTS
HYUTS

Reputation: 97

a = f.read()
str_data = a.decode("utf-8")

This worked for me but im also reading from a .txt and not .gz Python3

Upvotes: -1

dsh
dsh

Reputation: 12214

You are using Python 3. You linked to information about Python 2.

The documentation states:

As mentioned in the Overview, Python distinguishes between binary and text I/O. Files opened in binary mode (including 'b' in the mode argument) return contents as bytes objects without any decoding. In text mode (the default, or when 't' is included in the mode argument), the contents of the file are returned as str, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given.

Python 3 is very deliberate about bytes versus characters (strings). Python 2 is sloppy about it, which can cause many problems.

Upvotes: 7

wim
wim

Reputation: 362776

You can open in rb or rt mode (the default is read binary, giving you bytes). This is mentioned in the gzip.open docstring:

The mode argument can be "r", "rb", "w", "wb", "x", "xb", "a" or "ab" for binary mode, or "rt", "wt", "xt" or "at" for text mode. The default mode is "rb", and the default compresslevel is 9.

If you pass the keyword argument mode="rt" when opening (and you know the right encoding), then you should get a string returned when calling read method.

Upvotes: 9

Related Questions