Reputation: 4807
I am using the following code to delete large number of files in python:
import os
from multiprocessing import Pool
def deleteFiles(loc):
def Fn_deleteFiles(inp):
[fn, loc] = [inp['fn'], inp['loc']]
os.remove(os.path.join(loc, fn))
p = Pool(5)
for path, subdirs, files in os.walk(loc):
if len(files) > 0:
inpData = [{'fn':x, 'loc':loc} for x in files]
p.map(Fn_deleteFiles, inpData)
p.close()
if __name__ == '__main__':
loc = r'C:\myDriveWithFilesToDelete'
deleteFiles(loc)
I get the following error:
File "C:\Program Files\Python 3.5\lib\multiprocessing\reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'deleteFiles.<locals>.Fn_deleteFiles'
Upvotes: 1
Views: 1683
Reputation: 1724
The problem is that you are creating a function, inside of a function.
The function Fn_deleteFiles(inp)
, is defined inside of deleteFiles(loc)
.
This means that Fn_deleteFiles(inp)
is _only_ made when deleteFiles(loc)
is run.
The problem is that, internally, multiprocessing.pool.Pool()
calls the pickle
library to transfer function objects from this python process, to the one new python function that is being spawned.
However, pickle
will fail to stringify a function, if it can not locate the functions origin.
Here is a demo that demonstrates a similar error.
import pickle
def foo():
def bar():
return "Hello"
return bar
bar = foo()
if __name__ == '__main__':
s = pickle.dumps(bar)
Will cause the same error:
Traceback (most recent call last):
File ".../stacktest.py", line 10, in <module>
s = pickle.dumps(bar)
AttributeError: Can't pickle local object 'foo.<locals>.bar'
So to fix this error, you can either use multiprocessing.pool.ThreadPool
instead, as it does not pickle.
import os
from multiprocessing.pool import ThreadPool as Pool
def deleteFiles(loc):
def Fn_deleteFiles(inp):
[fn, loc] = [inp['fn'], inp['loc']]
os.remove(os.path.join(loc, fn))
p = Pool(5)
for path, subdirs, files in os.walk(loc):
if len(files) > 0:
inpData = [{'fn':x, 'loc':loc} for x in files]
p.map(Fn_deleteFiles, inpData)
p.close()
if __name__ == '__main__':
loc = 'DriveWithFilesToDelete'
deleteFiles(loc)
Alternatively, you can define the Fn_deleteFiles(inp)
outside of deleteFiles(loc)
to fix this issue.
WARNING For reasons I don't understand, this answer will hang inside of the idle interpreter.
import os
from multiprocessing import Pool
def Fn_deleteFiles(inp):
print("Delete", inp)
[fn, loc] = [inp['fn'], inp['loc']]
os.remove(os.path.join(loc, fn))
def deleteFiles(loc):
p = Pool(5)
for path, subdirs, files in os.walk(loc):
if len(files) > 0:
inpData = [{'fn':x, 'loc':loc} for x in files]
p.map(Fn_deleteFiles, inpData)
p.close()
if __name__ == '__main__':
loc = 'DriveWithFilesToDelete'
deleteFiles(loc)
Upvotes: 1