Reputation: 7824
I'm trying to create a tar
archive in Python and during creation send/stream it's bytes to a remote host. The communication with the remote host is a custom protocol, with each message/packet carrying payload of a specific size.
To try parallel creation and reading of a tar
archive, I wrote the following simple test script:
import tarfile
import threading
import os
import select
BLOCKSIZE = 4096
(r,w) = os.pipe()
wfd = os.fdopen(w, "w")
def maketar(buf, paths):
tar = tarfile.open(mode='w|', fileobj=buf)
for p in paths:
tar.add(p)
tar.close()
x = threading.Thread(target=maketar, args=(wfd, ["1M", "2M"]))
x.start()
poller = select.poll()
poller.register(r, select.POLLIN)
with open("out/archive.tar", "wb") as outf:
while True:
if poller.poll(10):
outf.write(os.read(r, BLOCKSIZE))
elif not x.is_alive():
break
The files 1M
and 2M
are supposed to be packed into out/archive.tar
. However, the archive is corrupt after the script finishes:
$ tar xf archive.tar
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
$ ls -la
total 4.0M
-rw-rw-r-- 1 xx xx 1.0M Nov 8 11:37 1M
-rw-rw-r-- 1 xx xx 1023K Nov 8 13:12 2M
-rw-rw-r-- 1 xx xx 2.0M Nov 8 12:55 archive.tar
Both files should be of size 1M
; the size of the archive is approximately correct, but 2M
is too small. What am I missing here? Is it a buffering issue of the os.pipe()
file descriptors?
Upvotes: 0
Views: 198
Reputation: 7824
Turns out I simply needed to buf.flush()
the write buffer at the end of the maketar()
function. It works fine now.
def maketar(buf, paths):
tar = tarfile.open(mode='w|', fileobj=buf)
for p in paths:
tar.add(p)
tar.close()
buf.flush()
Upvotes: 1