Reputation: 189
This code works in python 2.7 and fails in 3.5. I would like to convert it to 3.5. I am stuck where the act of using a for loop is changing the type of the data. I am a practiced programmer who is relatively new to python so this may be obvious, and my google-foo has been failing to find this exact example or a solution. So here we go:
The following are snippets from this code which works in 2.7: http://trac.nccoos.org/dataproc/browser/DPWP/trunk/DPWP/ADCP_splitter/pd0.py pd0.py opens a binary input stream, looks for record type identifying bytes and separates the data into two separate files containing the appropriate data, all binary.
In the code block below, header, length and ensemble are all bytes objects. In python 3.5 something happens when the for loop iterates, it generates int, which then causes struct.unpack to fail. You can see in the comments where I played around with casting, referencing, all which has not worked. I wish to understand in detail what is going on here, so that I can program more 3.5 binary operations correctly.
What fails is value = struct.unpack('B', byte)[0]
Where I have looked for solutions:
Thanks in advance. Here is the code:
def __computeChecksum(header, length, ensemble):
"""Compute a checksum from header, length, and ensemble"""
# these print as a byte (b'\x7f\x7f' or b'\x7fy') at this point
print(header) # header is a bytes object
cs = 0
# so, when the first byte of header is assigned to byte, it gets cast to int. Why, and how to prevent this?
for byte in header:
print(byte) # this prints as an integer at this point, 127 = 0x7F because a bytes object is a "mutable sequence of integers"
print(type(byte)) # here byte is an int - we need it to be a bytes object for unpack to work
value = struct.unpack('B', byte)[0] # this is the line that gets TypeError: a bytes-like object is required, not 'int'
# this does not work either - from examples online I thought that referencing the first in the array was the problem
#value = struct.unpack('B', byte) # this is the line that gets TypeError: a bytes-like object is required, not 'int'
# this does not work, the error is unpack requires a bytes object of lenth 1, so the casting happened
#value = struct.unpack('B', bytes(byte))[0]
# and this got the error a bytes-like object is required, not 'int', so the [0] reference generates an int
# value = struct.unpack('B', bytes(byte)[0])[0]
cs += value
for byte in length:
value = struct.unpack('B', byte)[0]
cs += value
for byte in ensemble:
value = struct.unpack('B', byte)[0]
cs += value
return cs & 0xffff
# convenience function reused for header, length, and checksum
def __nextLittleEndianUnsignedShort(file):
"""Get next little endian unsigned short from file"""
raw = file.read(2)
"""for python 3.5, struct.unpack('<H', raw)[0] needs to return a
byte, not an int
Note that it's not a problem here, but in the next cell, when a for loop is involved, we get an error
"""
return (raw, struct.unpack('<H', raw)[0])
Code in the main program which calls the functions above
while (header == wavesId) or (header == currentsId):
print('recnum= ',recnum)
# get ensemble length
rawLength, length = __nextLittleEndianUnsignedShort(rawFile)
# read up to the checksum
rawEnsemble = rawFile.read(length-4)
# get checksum
rawChecksum, checksum = __nextLittleEndianUnsignedShort(rawFile)
computedChecksum = __computeChecksum(rawHeader, rawLength, rawEnsemble)
if checksum != computedChecksum:
raise IOError('Checksum error')
And finally, the full text of the error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-5e60bd9b9a54> in <module>()
13 rawChecksum, checksum = __nextLittleEndianUnsignedShort(rawFile)
14
---> 15 computedChecksum = __computeChecksum(rawHeader, rawLength, rawEnsemble)
16
17 if checksum != computedChecksum:
<ipython-input-3-414811fc52e4> in __computeChecksum(header, length, ensemble)
16 print(byte) # this prints as an integer at this point, 127 = 0x7F because a bytes object is a "mutable sequence of integers"
17 print(type(byte)) # here byte is an int - weneed it to be a bytes object for unpack to work
---> 18 value = struct.unpack('B', byte)[0] # this is the line that gets TypeError: a bytes-like object is required, not 'int'
19 # this does not work either - from examples online I thought that referencing the first in the array was the problem
20 #value = struct.unpack('B', byte) # this is the line that gets TypeError: a bytes-like object is required, not 'int'
TypeError: a bytes-like object is required, not 'int'
The full python notebook is here: https://gist.github.com/mmartini-usgs/4795da39adc9905f70fd8c27a1bba3da
Upvotes: 4
Views: 9597
Reputation: 189
The most elegant solution turned out to be simply:
ensemble = infile.read(ensemblelength)
def __computeChecksum(ensemble):
cs = 0
for byte in range(len(ensemble)-2):
cs += ensemble[byte]
return cs & 0xffff
Upvotes: 2
Reputation: 599
It is complicated to answer without knowing what the header
is and how the data is being read. In theory, if you read it with rb
(read binary), that should not happen. (That was in the comments actually.)
Here is a better explanation of the problem.
iterate over individual bytes in python3
I would take the int using an if-clause, but you can re-cast to bytes like in that answer. Also, take a look at numpy.fromfile
. It easier to use IMO.
PS: That is quite a big post with tons of details! You'll probably get more meaningful answers if you follow SSCCE. And you can always post the link to the full notebook like you did ;-)
I would re-write your question with only your comments like:
When iterating over bytes on Python 3.x I get ints instead of bytes. Is it possible to get all bytes instead?
In [0]: [byte for byte in b'\x7f\x7f']
Out[0]: [127, 127]
Upvotes: 0