Reputation: 158
I am referring to this answer in order to handle multiple files at once using multiprocessing but it stalls and doesn't work
That is my try:
import multiprocessing
import glob
import json
def handle_json(file):
with open(file, 'r', encoding = 'utf-8') as inp, open(file.replace('.json','.txt'), 'a', encoding = 'utf-8', newline = '') as out:
length = json.load(inp).get('len','') #Note: each json file is not large and well formed
out.write(f'{file}\t{length}\n')
p = multiprocessing.Pool(4)
for f, file in enumerate(glob.glob("Folder\\*.json")):
p.apply_async(handle_json, file)
print(f)
p.close()
p.join() # Wait for all child processes to close.
Where is the problem exactly, I thought it may be because I have 3000 json files so I copied just 50 into another folder and tried with them but also the same problem
ADDED: Debug with VS Code
Exception has occurred: RuntimeError (note: full exception trace is shown but execution is paused at: <module>)
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
File "C:\Users\admin\Desktop\F_New\stacko.py", line 10, in <module>
p = multiprocessing.Pool(4)
File "<string>", line 1, in <module> (Current frame)
Another ADD Here a zip file contains the sample file with the code https://drive.google.com/file/d/1fulHddGI5Ji5DC1Xe6Lq0wUeMk7-_J5f/view?usp=share_link
Upvotes: 1
Views: 280
Reputation: 17496
on windows you have to put your multiprocessing code guarded by an if __name__ == "__main__":
, Compulsory usage of if name=="main" in windows while using multiprocessing [duplicate]
you also need to use get
on the tasks that you launched with apply_async
, in order to wait for them to finish, so you should store them in a list and iterate the get
on them.
after fixing, your code would look as follows:
import multiprocessing
import glob
import json
def handle_json(file):
with open(file, 'r', encoding = 'utf-8') as inp, open(file.replace('.json','.txt'), 'a', encoding = 'utf-8', newline = '') as out:
length = json.load(inp).get('len','') #Note: each json file is not large and well formed
out.write(f'{file}\t{length}\n')
if __name__ == "__main__":
p = multiprocessing.Pool(4)
tasks = []
for f, file in enumerate(glob.glob("Folder\\*.json")):
task = p.apply_async(handle_json, [file])
tasks.append(task)
print(f)
for task in tasks:
task.get()
p.close()
p.join() # Wait for all child processes to close.
Upvotes: 1
Reputation: 11060
The apply_async
function in multiprocessing
expects the arguments to the called function to be iterable, so you need to do e.g.:
p.apply_async(handle_json, [file])
Upvotes: 1