Reputation: 137
The problem is stdin doesn't support seek which is needed by avro, so we read everything to buffer and then giving this to avro_wrapper. It works in Python 2, but doesn't work in Python 3. I have tried a few solutions but none of them are working.
# stdin doesn't support seek which is needed by avro... so this hack worked in python 2. This does not work in Python 3.
# Reading everything to buffer and then giving this to avro_wrapper.
buf = StringIO()
buf.write(args.input_file.read())
r = DataFileReader(buf, DatumReader())
# Very first record the headers information. Which gives the header names in order along with munge header names for all the record types
# For e.g if we have 2 ports then it will hold the header information of
# 1. port1 on name1 key
# 2. port2 on name2 key and so on
headers_record = next(r)['headers']
The above produces UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 17: invalid continuation byte
error.
We then tried doing it this way:
input_stream = io.TextIOWrapper(args.input_file.buffer, encoding='latin-1')
sio = io.StringIO(input_stream.read())
r = DataFileReader(sio, DatumReader())
headers_record = next(r)['headers']
This produces avro.schema.AvroException: Not an Avro data file: Obj doesn't match b'Obj\x01'.
error.
Another way:
input_stream = io.TextIOWrapper(args.input_file.buffer, encoding='latin-1')
buf = io.BytesIO(input_stream.read().encode('latin-1'))
r = DataFileReader(buf.read(), DatumReader())
headers_record = next(r)['headers']
This produces AttributeError: 'bytes' object has no attribute 'seek'" error.
Upvotes: 1
Views: 2055
Reputation: 1121306
io.BytesIO()
is the correct type to use to create a seekable in-memory file object containing binary data.
However, you made the mistake of reading out the bytes
data from your io.BytesIO()
file object, and passing those in instead of the actual file object.
Don't read, pass in the actual io.BytesIO
file object with the binary data read from stdin
:
buf = io.BytesIO(args.input_file.buffer.read())
r = DataFileReader(buf, DatumReader())
I passed in the args.input_file.buffer
data directly, assuming that args.input
is the TextIOWrapper
instance that decodes the stdin bytes, and .buffer
is the underlying BufferedReader
instance providing the raw binary data. There is no point in decoding this data as Latin-1, then encoding as Latin-1 again. Just pass the bytes on.
Upvotes: 1