Reputation: 693
I am trying to analyze some binary files and assumed that Python's read()
function returned a string from this article and a Tutorials Point article.
Yet when I messed around read()
myself I got a something other than what I read.
>>> with gzip.open('RTLog_20150424T194428.gz') as f:
a = f.read(3)
print(a)
type(a)
b'use'
<class 'bytes'>
>>> a
b'use'
>>> str(a)
"b'use'"
>>> b = 'asdfasdfasdf'
>>> type(b)
<class 'str'>
>>>
When tested on my own, the output of a read()
call returned a <class 'bytes'>
object, not a <class 'str'>
object.
What am I not getting?
Upvotes: 6
Views: 14130
Reputation: 97
a = f.read()
str_data = a.decode("utf-8")
This worked for me but im also reading from a .txt and not .gz Python3
Upvotes: -1
Reputation: 12214
You are using Python 3. You linked to information about Python 2.
The documentation states:
As mentioned in the Overview, Python distinguishes between binary and text I/O. Files opened in binary mode (including 'b' in the mode argument) return contents as bytes objects without any decoding. In text mode (the default, or when 't' is included in the mode argument), the contents of the file are returned as str, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given.
Python 3 is very deliberate about bytes versus characters (strings). Python 2 is sloppy about it, which can cause many problems.
Upvotes: 7
Reputation: 362776
You can open in rb
or rt
mode (the default is read binary, giving you bytes). This is mentioned in the gzip.open
docstring:
The mode argument can be "r", "rb", "w", "wb", "x", "xb", "a" or "ab" for binary mode, or "rt", "wt", "xt" or "at" for text mode. The default mode is "rb", and the default compresslevel is 9.
If you pass the keyword argument mode="rt"
when opening (and you know the right encoding), then you should get a string returned when calling read
method.
Upvotes: 9