Reputation: 46483
When uploading 100 files of 100 bytes each with SFTP, it takes 17 seconds here (after the connection is established, I don't even count the initial connection time). This means it's 17 seconds to transfer 10 KB only, i.e. 0.59 KB/sec!
I know that sending SSH commands to open
, write
, close
, etc. probably creates a big overhead, but still, is there a way to speed up the process when sending many small files with SFTP?
Or a special mode in paramiko
/ pysftp
to keep all the writes operations to do in a memory buffer (let's say all operations for the last 2 seconds), and then do everything in one grouped pass of SSH/SFTP? This would avoid to wait for the ping time between each operation.
Note:
import pysftp, time, os
with pysftp.Connection('1.2.3.4', username='root', password='') as sftp:
with sftp.cd('/tmp/'):
t0 = time.time()
for i in range(100):
print(i)
with sftp.open('test%i.txt' % i, 'wb') as f: # even worse in a+ append mode: it takes 25 seconds
f.write(os.urandom(100))
print(time.time() - t0)
Upvotes: 3
Views: 4216
Reputation: 202272
I'd suggest you to parallelize the upload using multiple connections from multiple threads. That's easy and reliable solution.
If you want to do the hard way by using buffering the requests, you can base your solution on the following naive example.
The example:
If I do plain SFTPClient.put
for 100 files, it takes about 10-12 seconds. Using the code below, I achieve the same about 50-100 times faster.
But! The code is really naive:
upload.localhandle.read(32*1024)
. That's true for small files only.SFTPClient
class.import paramiko
import paramiko.sftp
from paramiko.py3compat import long
ssh = paramiko.SSHClient()
ssh.connect(...)
sftp = ssh.open_sftp()
class Upload:
def __init__(self):
pass
uploads = []
for i in range(0, 100):
print(f"sending open request {i}")
upload = Upload()
upload.i = i
upload.localhandle = open(f"{i}.dat")
upload.remotepath = f"/remote/path/{i}.dat"
imode = \
paramiko.sftp.SFTP_FLAG_CREATE | paramiko.sftp.SFTP_FLAG_TRUNC | \
paramiko.sftp.SFTP_FLAG_WRITE
attrblock = paramiko.SFTPAttributes()
upload.request = \
sftp._async_request(type(None), paramiko.sftp.CMD_OPEN, upload.remotepath, \
imode, attrblock)
uploads.append(upload)
for upload in uploads:
print(f"reading open response {upload.i}");
t, msg = sftp._read_response(upload.request)
if t != paramiko.sftp.CMD_HANDLE:
raise SFTPError("Expected handle")
upload.handle = msg.get_binary()
print(f"sending write request {upload.i} to handle {upload.handle}");
data = upload.localhandle.read(32*1024)
upload.request = \
sftp._async_request(type(None), paramiko.sftp.CMD_WRITE, \
upload.handle, long(0), data)
for upload in uploads:
print(f"reading write response {upload.i} {upload.request}");
t, msg = sftp._read_response(upload.request)
if t != paramiko.sftp.CMD_STATUS:
raise SFTPError("Expected status")
print(f"closing {upload.i} {upload.handle}");
upload.request = \
sftp._async_request(type(None), paramiko.sftp.CMD_CLOSE, upload.handle)
for upload in uploads:
print(f"reading close response {upload.i} {upload.request}");
sftp._read_response(upload.request)
Upvotes: 3
Reputation: 46483
With the following method (100 asynchronous tasks), it's done in ~ 0.5 seconds, which is a massive improvement.
import asyncio, asyncssh # pip install asyncssh
async def main():
async with asyncssh.connect('1.2.3.4', username='root', password='') as conn:
async with conn.start_sftp_client() as sftp:
print('connected')
await asyncio.wait([sftp.put('files/test%i.txt' % i) for i in range(100)])
asyncio.run(main())
I'll explore the source, but I still don't know if it groups many operations in few SSH transactions, or if it just runs commands in parallel.
Upvotes: 6