Reputation: 4797
I have a pandas DataFrame with about 45,000 rows similar to:
from numpy import random
from pandas import DataFrame
df = DataFrame(random.rand(45000, 200))
I am trying to break up all the rows into a multiprocessing Queue like this:
from multiprocessing import Queue
rows = [idx_and_row[1] for idx_and_row in df.iterrows()]
my_queue = Queue(maxsize = 0)
for idx, r in enumerate(rows):
# print(idx)
my_queue.put(r)
But when I run it, only about 37,000 things get put into my_queue
and then it the program raises the following error:
raise Full
queue.Full
What is happening and how can I fix it?
Upvotes: 2
Views: 776
Reputation: 4797
It seems that on windows, the maximum amount of objects in a multiprocessing.Queue is infinite, but on Linux and MacOS the maximum size is 32767, which is 215 - 1, here is the significance of that number.
I solved the program by making an empty Queue object and then passing it to all the processes I wanted to pass it to, plus another process. The additional process is responsible for filling the queue with 10,000 rows, and checking it every few seconds to see if the queue has been emptied. When its empty, another 10,000 rows are added. This way, all 45,000 row is processed.
Upvotes: 1
Reputation: 15040
The multiprocessing.Queue
is designed for inter-process communication. It is not intended for storing large amount of data. For that purpose, I'd suggest to use Redis or Memcached.
Usually, the queue maximum size is platform dependent, even if you set it to 0
. You have no easy way to workaround that.
Upvotes: 1