faheemKurikkal
faheemKurikkal

Reputation: 9

Concurrently downloading files in python using multi process

I have made a code below to download files using pySmartDL. I would like to download more than one file at a time. Tried to implement it using multi process. But second process starts only when first finishes. Code is below:

import time
from multiprocessing import Process
from pySmartDL import SmartDL, HashFailedException    
def down():
    dest='/home/faheem/Downloads'
    obj = SmartDL(url_100mb_file,dest, progress_bar=False,fix_urls=True)
    obj.start(blocking=False)
    #cnt=1
    while not obj.isFinished():
            print("Speed: %s" % obj.get_speed(human=True))
            print("Already downloaded: %s" % obj.get_dl_size(human=True))
            print("Eta: %s" % obj.get_eta(human=True))
            print("Progress: %d%%" % (obj.get_progress()*100))
            print("Progress bar: %s" % obj.get_progress_bar())
            print("Status: %s" % obj.get_status())
            print("\n"*2+"="*50+"\n"*2)
            print("SIZE=%s"%obj.filesize)
            time.sleep(2)

    if obj.isSuccessful():
            print("downloaded file to '%s'" % obj.get_dest())
            print("download task took %ss" % obj.get_dl_time(human=True))
            print("File hashes:")
            print(" * MD5: %s" % obj.get_data_hash('md5'))
            print(" * SHA1: %s" % obj.get_data_hash('sha1'))
            print(" * SHA256: %s" % obj.get_data_hash('sha256'))
            data=obj.get_data()
    else:
            print("There were some errors:")
            for e in obj.get_errors():
                    print(str(e))
    return
if __name__ == '__main__':
    #jobs=[]
    #for i in range(5):
    print 'Link1'
    url_100mb_file = ['https://softpedia-secure-download.com/dl/45b1fc44f6bfabeddeb7ce766c97a8f0/58b6eb0f/100255033/software/office/Text%20Comparator%20(v1.2).rar']
    Process(target=down()).start()
    print'link2'
    url_100mb_file = ['https://www.crystalidea.com/downloads/macsfancontrol_setup.exe']
    Process(target=down()).start()

Here link2 starts downloading when link1 finishes, but I need both download to perform concurrently. I would like to implement this method to perform upto 10 downloads at a time. So is it good to use multiprocessing? Is there any other better memory efficient method. I am a beginner in these codes, so kindly define the answer easily.. Regards

Upvotes: 0

Views: 2132

Answers (3)

9000
9000

Reputation: 40884

Since your program is I/O-bound, you can use multi-processing or mult-threading.

Just in case, I'd like to remind the classical pattern for problems like this. Have a queue of URLs from which worker processes / threads pull URLs for processing, and have a status queue where the workers push their progress reports or errors.

A thread pool or a process pull greatly simplifies things, compared to manual control.

Upvotes: 0

Carles Mitjans
Carles Mitjans

Reputation: 4866

You can also use python module Thread. Here is a little snippet on how it works:

import threading
import time

def func(i):
        time.sleep(i)
        print i

for i in range(1, 11):
        thread = threading.Thread(target = func, args=(i,))
        thread.start()
        print "Launched thread " + str(i)

print "Done"

Run this snippet and you will get a perfect idea on how it works. Knowing that, you can actually run your code, passing as an argument to the function the url to use in each thread.

Hope that helps

Upvotes: 1

nico
nico

Reputation: 2121

The particular library you're using appears to already support non-blocking downloads so why no just do the following? Non-blocking means it'll run in a seperate process.

from time import sleep
from pySmartDL import SmartDL 

links = [['https://softpedia-secure download.com/dl/45b1fc44f6bfabeddeb7ce766c97a8f0/58b6eb0f/100255033/software/office/Text%20Comparator%20(v1.2).rar'],['https://www.crystalidea.com/downloads/macsfancontrol_setup.exe']]

objs = [SmartDL(link, progress_bar=False) for link in links]

for obj in objs:
    obj.start(blocking=False)

while not all(obj.isFinished() for obj in objs):
     sleep(1)

Upvotes: 0

Related Questions