Stefan Falk
Stefan Falk

Reputation: 25367

How to stream files from tarfile for reading?

I am trying to read wav files from a tarfile which is located in a bucket. Since there are a lot of files I do not want to extract those files first.

Instead, I would like to read the data from the tarfile and stream it to wavfile.read (from scipy.io)

with tf.gfile.Open(chunk_fp, mode='rb') as f:
    with tarfile.open(fileobj=f, mode='r|*') as tar:
        for member in ds_text.index.values:
            bytes = BytesIO(tar.extractfile(member))  # Obviously not working
            rate, wav_data = wavfile.read(bytes)
            # Do stuff with data ..

However, I am not able to get my hands on a steam for wavfile.read to work on.

Trying different things gets me different errors:

 tar.extractfile(member).seek(0)

{AttributeError}'_Stream' object has no attribute 'seekable'

 tar.extractfile(member).raw.read()

{StreamError}seeking backwards is not allowed

and so on.

Any ideas how I can achieve this?

Upvotes: 2

Views: 1471

Answers (1)

Stefan Falk
Stefan Falk

Reputation: 25367

It turns out that I just opened the file in the wrong mode. Using r:* instead of r|* works:

with tarfile.open(fileobj=f, mode='r:*') as tar:

Upvotes: 2

Related Questions