Reputation: 303
Let's say I have a web bot written in python that sends data via POST request to a web site. The data is pulled from a text file line by line and passed into an array. Currently, I'm testing each element in the array through a simple for-loop. How can I effectively implement multi-threading to iterate through the data quicker. Let's say the text file is fairly large. Would attaching a thread to each request be smart? What do you think the best approach to this would be?
with open("c:\file.txt") as file:
dataArr = file.read().splitlines()
dataLen = len(open("c:\file.txt").readlines())-1
def test(data):
#This next part is pseudo code
result = testData('www.example.com', data)
if result == 'whatever':
print 'success'
for i in range(0, dataLen):
test(dataArr[i])
I was thinking of something along the lines of this, but I feel it would cause issues depending on the size of the text file. I know there is software that exists which allows the end-user to specify the amount of the threads when working with large amounts of data. I'm not entirely sure of how that works, but that's something I'd like to implement.
import threading
with open("c:\file.txt") as file:
dataArr = file.read().splitlines()
dataLen = len(open("c:\file.txt").readlines())-1
def test(data):
#This next part is pseudo code
result = testData('www.example.com', data)
if result == 'whatever':
print 'success'
jobs = []
for x in range(0, dataLen):
thread = threading.Thread(target=test, args=(dataArr[x]))
jobs.append(thread)
for j in jobs:
j.start()
for j in jobs:
j.join()
Upvotes: 2
Views: 2057
Reputation: 1462
Threads are slow in python because of the Global Interpreter Lock. You should consider using multiple processes with the Python multiprocessing
module instead of threads. Using multiple processes can increase the "ramp up" time of your code, as spawning a real process takes more time than a light thread, but due to the GIL, threading
won't do what you're after.
Here and here are a couple of basic resources on using the multiprocessing
module. Here's an example from the second link:
import multiprocessing as mp
import random
import string
# Define an output queue
output = mp.Queue()
# define a example function
def rand_string(length, output):
""" Generates a random string of numbers, lower- and uppercase chars. """
rand_str = ''.join(random.choice(
string.ascii_lowercase
+ string.ascii_uppercase
+ string.digits)
for i in range(length))
output.put(rand_str)
# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string, args=(5, output)) for x in range(4)]
# Run processes
for p in processes:
p.start()
# Exit the completed processes
for p in processes:
p.join()
# Get process results from the output queue
results = [output.get() for p in processes]
print(results)
Upvotes: 1
Reputation: 7036
This sounds like a recipe for multiprocessing.Pool
See here: https://docs.python.org/2/library/multiprocessing.html#introduction
from multiprocessing import Pool
def test(num):
if num%2 == 0:
return True
else:
return False
if __name__ == "__main__":
list_of_datas_to_test = [0, 1, 2, 3, 4, 5, 6, 7, 8]
p = Pool(4) # create 4 processes to do our work
print(p.map(test, list_of_datas_to_test)) # distribute our work
Output looks like:
[True, False, True, False, True, False, True, False, True, False]
Upvotes: 2