Progress Function Called Too Often Using pycurl For FTP Upload

Question

I'm on Ubuntu 12.0.4 with pycurl version 7.19 and libcurl3 version 7.22 (pycurl and libcurl both installed directly from Ubuntu repositories with apt-get). My code to upload a file is ("self" is a wrapper object of mine):

self.curlTransfer = pycurl.Curl()
self.curlTransfer.setopt(pycurl.UPLOAD, 1)
self.curlTransfer.setopt(pycurl.USERPWD, '%s:%s'%(str(self.userName), str(self.password)))
self.curlTransfer.setopt(pycurl.NOPROGRESS, 0)
self.curlTransfer.setopt(pycurl.PROGRESSFUNCTION, self.__UpdateFileTransferProgress)
f = open(fileName, 'rb')
self.curlTransfer.setopt(pycurl.URL, 'ftp://' + self.ipAddress + self.path + destination)
self.curlTransfer.setopt(pycurl.INFILESIZE_LARGE, os.path.getsize(fileName))
self.curlTransfer.setopt(pycurl.READFUNCTION, f.read)
self.curlTransfer.perform()

My callback function "__UpdateFileTransferProgress" gets called thousands of times per second to the point where the transfer is ~3X slower than if I turn off the progress callback. I have searched far and wide to resolve this and the only relevant thing I have found is this curl bug report. It sounds like the bug may have been patched, but it's unclear if the patch has made it into my version (or if this is a different issue entirely).

Has anyone come across this? I looked into updating to the latest libcurl/pycurl versions manually, but trying to work through the dependencies deterred me. I really like the performance of pycurl compared to ftplib (when the progress callback is disabled), but I need the callback function to track the transfer progress.

user1777820 · Accepted Answer

SEE EDIT FOR CLEANER SOLUTION!

I bit the bullet and downloaded the latest source for libcurl and pycurl (was actually pretty easy to build/install). This improved the situation, as the progress function is now only called hundreds of times per second instead of thousands, but there is still a very noticeable performance hit when using the progress callback. To circumvent this, I setup the transfer like this:

# Set transfer parameters.
self.curlTransfer.fp = open(fileName, 'rb')
self.curlTransfer.fileSize = os.path.getsize(fileName)
self.curlTransfer.setopt(pycurl.URL, 'ftp://' + self.ipAddress + self.path + destination)
self.curlTransfer.setopt(pycurl.INFILESIZE_LARGE, self.curlTransfer.fileSize)
self.curlTransfer.setopt(pycurl.READDATA, self.curlTransfer.fp)

# Store file.
self.curlTransfer.perform()

And then if I want to get the progress in another thread:

def GetDataTransferred(self):
"""
Gets the amount of data transferred for the current file transfer.

@return Amount of data transferred (MB).
"""
try:

    # Try/except in case file is closed.
    try:
        return (float(self.curlTransfer.fp.tell())/float(myConstants.MB))
    except:
        if(self.curlTransfer.fileSize):
            return self.curlTransfer.fileSize

    return 0

except:
    Warning("Unable to get the amount of data transferred.")
    return 0

Basically I cheat and use the file pointer "tell" to see where pycurl is in the transfer.

Edit/Solved: I ended up fixing the bug in libcurl myself by modifying /lib/progress.c as shown in the bug report thread in my OP (Imgur link). It sounds like they committed the fix in their trunk source code, but it's not included in their latest release (7.37.1). The reason I ended up going this route is because the cleanest way to stop a transfer is to return nonzero from the progress function. You can return nonzero from your pycurl.READFUNCTION to stop a transfer, but with an FTP upload, that function gets called once per block (~16KB) and is very slow (use pycurl.READDATA instead and provide a file pointer). Now I can cleanly stop a transfer, use their intended progress update method, and keep the high performance of libcurl.

Progress Function Called Too Often Using pycurl For FTP Upload

Answers (1)

Related Questions