Reputation: 22661
I wrote a simple python multiprocessing, in which it reads a bunch of lines from csv, calls an api and then writes to new csv. However, what I see is that performance of this program is same as sequential execution. Changing the pool size does not have any effect. What is going wrong?
from multiprocessing import Pool
from random import randint
from time import sleep
import csv
import requests
import json
def orders_v4(order_number):
response = requests.request("GET", url, headers=headers, params=querystring, verify=False)
return response.json()
newcsvFile=open('gom_acr_status.csv', 'w')
writer = csv.writer(newcsvFile)
def process_line(row):
ol_key = row['\ufeffORDER_LINE_KEY']
order_number=row['ORDER_NUMBER']
orders_json = orders_v4(order_number)
oms_order_key = orders_json['oms_order_key']
order_lines = orders_json["order_lines"]
for order_line in order_lines:
if ol_key==order_line['order_line_key']:
print(order_number)
print(ol_key)
ftype = order_line['fulfillment_spec']['fulfillment_type']
status_desc = order_line['statuses'][0]['status_description']
print(ftype)
print(status_desc)
listrow = [ol_key, order_number, ftype, status_desc]
#(writer)
writer.writerow(listrow)
newcsvFile.flush()
def get_next_line():
with open("gom_acr.csv", 'r') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
yield row
f = get_next_line()
t = Pool(processes=50)
for i in f:
t.map(process_line, (i,))
t.join()
t.close()
Upvotes: 2
Views: 77
Reputation: 13393
EDIT: I just noticed you call map
inside a loop. you need to call it only once. is is a blocking function, it is not async! check out the docs for examples of correct usage.
A parallel equivalent of the map() built-in function (it supports only one iterable argument though). It blocks until the result is ready.
Original answer:
The fact that all processes write to the output file causes file-system contention.
If your process_line
function would just return the rows (e.g. as a list of strings), then the main processes would write all of those after map
returned them all, then you should experience a performance boost.
also, 2 notes:
Upvotes: 2