Reputation: 33
I have this code:
configurationsFile = "file.csv"
configurations = []
def loadConfigurations():
with open(configurationsFile) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=';')
line_count = 0
for row in csv_reader:
url = row[0]
line_count += 1
configurations.append({"url": url})
print(f'{line_count} urls loaded.')
loadConfigurations()
failedConfigs = []
session_requests = requests.session()
for config in configurations:
try:
"Do something with the url loaded fron file.csv"
except Exception as e:
print(e)
failedConfigs.append(config)
if len(failedConfigs) > 0:
print("These errored out:")
for theConfig in failedConfigs:
print("ERROR: {}".format(theConfig['url']))
It reads urls from a csv file, and then runs a code for each of the urls that's listed in the csv file. The only "problem" is if the csv file contains a lot of urls, then it takes a long time to run thru them all. So I'm looking for a way to run more then one url at a time.
I'm not that good with python so I don't even know if it's possible. But the question is, is there some way to tell the code to run, say 5 urls at once instead of just 1?
Upvotes: 2
Views: 54
Reputation: 27557
You can use the threading.Thread
class. Here is an example:
from threading import Thread
def read(file, start, end):
with open(file, 'r') as r:
for i, v in enumerate(r):
if start <= i < end:
print(v)
file = "file.txt"
t1 = Thread(target=read, args=(file, 0, 100))
t2 = Thread(target=read, args=(file, 100, 200))
t3 = Thread(target=read, args=(file, 200, 300))
t4 = Thread(target=read, args=(file, 300, 400))
t5 = Thread(target=read, args=(file, 400, 500))
t1.start()
t2.start()
t3.start()
t3.start()
t5.start()
t1.join()
t2.join()
t3.join()
t4.join()
t5.join()
Or use a loop:
from threading import Thread
def read(file, start, end):
with open(file, 'r') as r:
for i, v in enumerate(r):
if start <= i < end:
print(v)
file = "file.txt"
threads = []
for i in range(5):
threads.append(Thread(target=read, args=(file, i * 100, (i + 1) * 100)))
for t in threads:
t.start()
for t in threads:
t.join()
Basically, the read()
function defined above reads in a file from line start
to line end
. Split the reading tasks into 5 segments so that 5 threads can simultaneously read the file.
For your code, the
for config in configurations:
try:
"Do something with the url loaded fron file.csv"
except Exception as e:
print(e)
failedConfigs.append(config)
Can be converted to a function which allows you to specify from which index to which index of the configurations
you want to process:
def process(start, end):
for i in range(start, end):
config = configurations[i]
try:
"Do something with the url loaded fron file.csv"
except Exception as e:
print(e)
failedConfigs.append(config)
Which you can then add
threads = []
for i in range(5):
threads.append(Thread(target=process, args=(i * 100, (i + 1) * 100)))
for t in threads:
t.start()
for t in threads:
t.join()
So you might end up with something like:
configurationsFile = "file.csv"
configurations = []
def loadConfigurations():
with open(configurationsFile) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=';')
line_count = 0
for row in csv_reader:
url = row[0]
line_count += 1
configurations.append({"url": url})
print(f'{line_count} urls loaded.')
loadConfigurations()
failedConfigs = []
session_requests = requests.session()
def process(start, end):
for i in range(start, end):
config = configurations[i]
try:
"Do something with the url loaded fron file.csv"
except Exception as e:
print(e)
failedConfigs.append(config)
threads = []
for i in range(5):
threads.append(Thread(target=process, args=(i * 100, (i + 1) * 100)))
for t in threads:
t.start()
for t in threads:
t.join()
if len(failedConfigs) > 0:
print("These errored out:")
for theConfig in failedConfigs:
print("ERROR: {}".format(theConfig['url']))
Upvotes: 1