Reputation: 59
I'm trying to read in a file, translate it using a remote api endpoint, then write it to a file.
It was really slow due to each request taking 2-3 seconds, so I've opted to using threads to speed up the translation by hitting the endpoint multiple times in parallel (As recommended in their api docs)
However I'm having trouble coming up with a way to write the translated lines in the correct order. Race Conditions I suppose. I'm thinking the issue is that I'm writing to a single file from multiple threads. So I would need a queue or something, but I have no idea how to approach it.
Main()
#Open File
for filename in os.listdir("files"):
with open('translate/' + filename, 'w', encoding='UTF-8') as outFile:
with open ('files/' + filename, 'r', encoding='UTF-8') as f:
count = 0
#Replace Each Line
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
future_to_url = (executor.submit(findMatch, line, count) for line in f)
for future in concurrent.futures.as_completed(future_to_url):
print(future)
FindMatch()
def findMatch(line, count):
count = count + 1 #Keep track of lines for debugging
#Check if match in line
if(re.search(pattern1, line) != None):
#Translate each match in line. Depends on choice
for match in re.findall(pattern1, line):
#Filter out matches with no Japanese
if(re.search(pattern2, match) != None and '$' not in match):
if(choice == '1'):
match = match.rstrip()
print('Translating: ' + str(count) + ': ' + match)
translatedMatch = translate(match)
line = re.sub(match, translatedMatch, line, 1)
elif(choice == '2'):
match = match.rstrip()
print('Translating Line: ' + str(count))
line = translate(line)
break #Don't want dupes
else:
print('Bad Coder. Check your if statements')
outFile.write(line)
#Skip Line
else:
print('Skipping: ' + str(count))
outFile.write(line)
Upvotes: 0
Views: 1317
Reputation: 1368
To write lines in correct order,
for future in future_to_url
to iterate the futures in the submission order.[execuotr.submit(...) for line in f]
instead of generator expression (execuotr.submit(...) for line in f)
. All lines are submitted to the executor at once. Otherwise, tasks are submitted on-demand one-by-one while the loop is iterated, which is not parallelized.findMatch()
return the result rather than write to the output directly.When the call future.result()
is made, it returns immediately the result if available, or block and wait the result.
import concurrent.futures
import os
def main():
# Open File
for filename in os.listdir("files"):
with open('translate/' + filename, 'w', encoding='UTF-8') as outFile:
with open('files/' + filename, 'r', encoding='UTF-8') as f:
count = 0
# Replace Each Line
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# The following submit all lines
future_to_url = [executor.submit(findMatch, line, count) for line in f]
# as_completed return arbitrary future when it is done
# Use simple for-loop ensure the future are iterated sequentially
for future in future_to_url:
print(future.result())
# Uncomment to actually write to the output
# outFile.write(future.result())
def findMatch(line, count):
count = count + 1 # Keep track of lines for debugging
# Check if match in line
if (re.search(pattern1, line) != None):
# Translate each match in line. Depends on choice
for match in re.findall(pattern1, line):
# Filter out matches with no Japanese
if (re.search(pattern2, match) != None and '$' not in match):
if (choice == '1'):
match = match.rstrip()
print('Translating: ' + str(count) + ': ' + match)
translatedMatch = translate(match)
line = re.sub(match, translatedMatch, line, 1)
elif (choice == '2'):
match = match.rstrip()
print('Translating Line: ' + str(count))
line = translate(line)
break # Don't want dupes
else:
print('Bad Coder. Check your if statements')
return line
# Skip Line
else:
print('Skipping: ' + str(count))
return line
Upvotes: 2
Reputation: 11332
I think the simplest solution would be for findMatch
to take a string as an argument and return its translation as a string. Your main program would then be responsible for sorting all the translations and printing them out in order.
Attempting to synchronize multiple threads all writing to a single file is a big mess.
Upvotes: 2