Reputation: 173
I've been messing around with sockets in Python and I'd like to be able to send a sparse image file from one machine to another. As expected, sending a sparse file over a python socket doesn't preserve the sparseness of the file. I'd like to do a sparse tar and send it that way, but I just can't figure it out.
The tarfile module says it supports reading sparse files with the GNU format which doesn't help me for creating them... but the python docs say the Pax format has "virtually no limits". I'm not sure if that means I can create an archive and preserve the sparse file or not using the pax format... I've been trying but I just have no idea how it might work.
If this solution isn't an option, is there any other way to send a sparse file over a socket? I hate to have to call 'tar -xSf' via a system command from my application...
Thanks,
Server
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
s.bind((socket.gethostname(), 50001))
s.listen(1)
img = open('test.img', 'rb')
client, addr = s.accept()
l = img.read(8192)
while(l):
client.send(l)
l = img.read(8192)
img.close()
s.close()
Client
host = ''
port = 50001
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
s.connect((host, port))
img = open('./newimg.img', 'wb')
l = s.recv(8192)
while(l):
img.write(l)
l = s.recv(8192)
img.close()
s.close()
On the server, I make a new sparse file: truncate -s 1G test.img
a du -h shows: 0 test.img
I run my server and client. Here is a du -h on the transferred file: 1.0G newimg.img
As you can see, it expands the file and it is no longer sparse.
Upvotes: 1
Views: 770
Reputation: 7384
Holes in files are normally created if you write to the beginning of a file, seek to the end and write there. If you read the file you are reading zeros even if there are holes in the file. When you send the files the literal bytes are sent and of course also read. When you then write the bytes all bytes will be written and it will not happen that the holes are created by the filesystem.
To mitigate that you can first seek the holes in the file, sent where they are, and then send the rest of the file.
The following is not polished but should give you a starting point.
import os
f = open(path, "b")
fd = f.fileno()
end = os.stat(fd).st_size
holes = []
offset = os.lseek(fd, 0, os.SEEK_HOLE)
while offset != end:
end_hole = os.lseek(fd, offset, os.SEEK_DATA)
holes.append((offset, end_hole))
offset = end_hole
[open socket and stuff]
# send the holes
socket.write(json.dumps(holes)) # encode appropriately
# send file
f.seek(0)
total = 0
for hole in holes:
while total < hole[0]:
l = f.read(8192)
if len(l) + total > hole[0]:
socket.write(l[:len(l) + total - hole[0]])
l.seek(hole[1])
total += len(1) + total - hole[0]
else:
socket.write(l)
total += len(l)
Then on the client side:
still_json = True
a = []
l = s.recv(8192)
while(still_json):
a.append(l)
if check_json_end(l):
still_json = False
else:
l = s.recv(8192)
holes = parse_json(a) # the last chunk can contain something that is not json
# I asume that a still contains the bytes that are not json
fout = open(outfile, "wb")
total = 0
fout.write(a[0]) # handle the case where the first rest after the json in a is already after a hole
total += len(a[0])
for hole in holes:
while total < hole[0]:
l = socket.recv(8192)
if len(l) + total > hole[0]:
fout.write(l[:len(l) + total - hole[0]])
fout.seek(hole[1])
fout.write(l[len(l) + total - hole[0]:])
else:
fout.write(l)
total += len(l)
There are probably lots of bugs in it and you should rethink each line, but the general principle should be alright. JSON is of course arbitrarily chosen, there are probably other protocols that are better in that case. You could also create your own.
Upvotes: 1