How can I efficiently implement multithreading/multiprocessing in a Python web bot?

Question

Let's say I have a web bot written in python that sends data via POST request to a web site. The data is pulled from a text file line by line and passed into an array. Currently, I'm testing each element in the array through a simple for-loop. How can I effectively implement multi-threading to iterate through the data quicker. Let's say the text file is fairly large. Would attaching a thread to each request be smart? What do you think the best approach to this would be?

with open("c:\file.txt") as file:
     dataArr = file.read().splitlines()

dataLen = len(open("c:\file.txt").readlines())-1

def test(data):
     #This next part is pseudo code
     result = testData('www.example.com', data)
     if result == 'whatever':
          print 'success'

for i in range(0, dataLen):
    test(dataArr[i])

I was thinking of something along the lines of this, but I feel it would cause issues depending on the size of the text file. I know there is software that exists which allows the end-user to specify the amount of the threads when working with large amounts of data. I'm not entirely sure of how that works, but that's something I'd like to implement.

import threading

with open("c:\file.txt") as file:
     dataArr = file.read().splitlines()

dataLen = len(open("c:\file.txt").readlines())-1

def test(data):
     #This next part is pseudo code
     result = testData('www.example.com', data)
     if result == 'whatever':
          print 'success'

jobs = []

for x in range(0, dataLen):
     thread = threading.Thread(target=test, args=(dataArr[x]))
     jobs.append(thread)

for j in jobs:
    j.start()
for j in jobs:
    j.join()

derricw · Accepted Answer

This sounds like a recipe for multiprocessing.Pool

See here: https://docs.python.org/2/library/multiprocessing.html#introduction

from multiprocessing import Pool

def test(num):
    if num%2 == 0:
        return True
    else:
        return False

if __name__ == "__main__":
    list_of_datas_to_test = [0, 1, 2, 3, 4, 5, 6, 7, 8]

    p = Pool(4)  # create 4 processes to do our work
    print(p.map(test, list_of_datas_to_test))  # distribute our work

Output looks like:

[True, False, True, False, True, False, True, False, True, False]

How can I efficiently implement multithreading/multiprocessing in a Python web bot?

Answers (2)

Related Questions