Reputation: 161
I have a dataframe HH that looks like this:
end_Date latitude longitude start_Date
0 9/5/2014 41.8927 -90.4031 4/1/2014
1 9/5/2014 41.8928 -90.4031 4/1/2014
2 9/5/2014 41.8927 -90.4030 4/1/2014
3 9/5/2014 41.8928 -90.4030 4/1/2014
4 9/5/2014 41.8928 -90.4029 4/1/2014
5 9/5/2014 41.8923 -90.4028 4/1/2014
I am trying to parallelize my function using multiprocessing package in python: here's what i wrote:
if __name__ =='__main__':
pool = Pool(200)
start = time.time()
print "Hello"
H = pool.map(funct_parallel, HH)
pool.close()
pool.join()
when I run this code, I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 685, in runfile
execfile(filename, namespace)
File "C:\Users\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/Desktop/testparallel.py", line 198, in <module>
H = pool.map(funct_parallel, HH)
File "C:\Users\AppData\Local\Continuum\Anaconda2\lib\multiprocessing\pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "C:\Users\AppData\Local\Continuum\Anaconda2\lib\multiprocessing\pool.py", line 567, in get
raise self._value
TypeError: string indices must be integers, not str
not sure where I am going wrong?
Upvotes: 0
Views: 402
Reputation: 42875
pool.map
requires an iterable
as second argument that it feeds to the function
see docs.
If you iterate over the DataFrame
, you get the column
names - hence the complaint about the string indices
.
for i in df:
print(i)
end_Date
latitude
longitude
start_Date
You need instead to break the DataFrame
into pieces that can be processed in parallel by the pool
, for instance by reading the file in chunks
as explained in the I/O docs.
Upvotes: 1