Reputation: 658
So, I have a script I'm working on to back up large a server('s directory) of files to a number of FTP accounts/services/whatever (at the moment the poor secretary has a copy-and-paste document to do this, but anyways I'm close to having a working script to save her from that =D).
I haven't really messed around with threading or multiprocessing before, but I can't figure out how to get it to take the list of files and upload 'em all to the host 3-5 at a time (in this example, I'm trying 5, but I dunno what I'll decide on).
import os, sys, subprocess, shutil, re, string, glob, tvdb_api, itertools, multiprocessing, ftplib
files = [os.path.join(r, f) for r, d, fs in os.walk(os.getcwd()) for f in fs if not f[0]=='.']
class FTP_Upload:
def __init__(self, p=os.getcwd()):
self.files_to_upload = sorted([f for f in files if os.path.split(f)[0] == p])
self.target = raw_input("Enter the host you want to upload to: ")
self.host = FTP('ftp.host1.com', 'user_name1', 'super_secret_password1') if self.target == 'host' else FTP('ftp.host2.com', 'user_name2', 'secret_password2') if self.target == 'host2' else None
def upload_files(self, f):
self.host.storbinary(('STOR /'+f.split('/')[-1]), open(f, 'rb'))
def multiupload(self):
p = multiprocessing.Pool(processes=5)
p.map(self.upload_files(f), self.files_to_upload)
FTP_Upload().multiupload()
But this just uploads the last file in self.files_to_upload...
I tried just making the file list an iterable
self.files_to_upload = iter(sorted([f for f in files if os.path.split(f)[0] == p]))
But no joy.
Thanks in advance for any help!
Upvotes: 1
Views: 4710
Reputation: 309841
If I understand you correctly, this sort of thing can be done quite easily with multiprocessing
. just write a function to upload one file --
e.g.
def upload_one(filename):
""" This function uploads one file.
Perhaps is a a wrapper to your Popen call? """
and then use mulitprocessing on a list of files
mylistoffiles=[ ] #Somehow generate your list of files to be uploaded.
import multiprocessing
Pool=multiprocessing.Pool(processes=X) #X is the number of processes you want to use
Pool.map(upload_one,mylistoffiles)
You can also play around with the chunksize which will speed things up a little bit if the uploads are quick.
Of course, if you need to pass more information than just the filename, one really easy way to accomplish that would be to make your list of files a list of tuples and unpack them in the function.
WARNING
Some might consider this bad practice since you're essentially using a map function for side-effects...
EDIT
I think your problem is p.map(self.upload_files(f), self.files_to_upload)
I'm not familiar with the FTP
in python, so I can't say for sure, but you want to pass a function as the first parameter to p.map
. You're passing the output of the function -- It's possible that you wrote a function which returns a function, but it doesn't look like it from the code above.
What you probably want is:
p.map(self.upload_files,self.files_to_upload)
In general, a call to a map
function can be translated to a list comprehension as follows:
map(function,iterable)
is almost equivalent to
[function(i) for i in iterable]
(almost equivalent because in python3.x map
returns a generator. Notice that in map
you don't actually call the function.
Final Edit (hopefully)
You're running into an (unfortunate) limitation of multiprocessing
. All the objects that you send around must be pickleable. Apparently your instance method (a method bound to an instance of a class) is not pickleable. One solution is that you can to change it to being a regular function. You can do that as follows.
import os, sys, subprocess, shutil, re, string, glob, tvdb_api, itertools, multiprocessing, ftplib
#No longer an instance method -- just a regular function.
#accepts an iterable and then splits it as [host,filename]
def upload_files(inpt):
host=inpt[0]
f=inpt[1]
#host,f=inpt #This might be a little cleaner, depending on your programming style.
host.storbinary(('STOR /'+f.split('/')[-1]), open(f, 'rb'))
files = [os.path.join(r, f) for r, d, fs in os.walk(os.getcwd()) for f in fs if not f[0]=='.']
class FTP_Upload:
def __init__(self, p=os.getcwd()):
self.files_to_upload = sorted([f for f in files if os.path.split(f)[0] == p])
self.target = raw_input("Enter the host you want to upload to: ")
self.host = FTP('ftp.host1.com', 'user_name1', 'super_secret_password1') if self.target == 'host' else FTP('ftp.host2.com', 'user_name2', 'secret_password2') if self.target == 'host2' else None
def multiupload(self):
p = multiprocessing.Pool(processes=5)
upload_this=[(self.host,f) for f in self.files_to_upload]
p.map(upload_files,upload_this)
FTP_Upload().multiupload()
Hopefully that will work out for you. Good Luck!
Upvotes: 2