MYK
MYK

Reputation: 3007

How do you process Tar Files in memory using BytesIO or StringIO (Python)?

I am using requests to get a .tar.gz object.

Ie. I run

response = requests.get(DOWNLOAD_URL)

and the contents of the response are the bytes in a .tar.gz file.

I could do

import tarfile

with open('tmp.tar.gz', 'wb') as file:
    file.write(response.content)

answer = {}

with tarfile.open('tmp.tar.gz') as tar:
    for member in tar:
        if member.name != '.':
            answer[member.name] = tar.extractfile(member).read()
            
# contents are in `answer`
answer

But I don't think the round trip to the hard drive is 100% necessary. I was trying to get this to work using either io.BytesIO or io.StringIO, but can't seem to get it working.

Upvotes: 0

Views: 1066

Answers (1)

chepner
chepner

Reputation: 531325

tarfile.open can take a fileobj keyword argument to specify the BytesIO object to work with.

with tarfile.open(fileobj=io.BytesIO(response.content)) as tar:

(Strictly speaking, you could use a positional argument, too, but then you'd have to explicitly provide None as the file name and a mode argument, like

tarfile.open(None, 'r', BytesIO(response.content)

but ugh, who wants to do that?)

Upvotes: 1

Related Questions