user1367204
user1367204

Reputation: 4797

MacOS: Why does Multiprocessing Queue.put stop working?

I have a pandas DataFrame with about 45,000 rows similar to:

from numpy  import random
from pandas import DataFrame

df = DataFrame(random.rand(45000, 200))

I am trying to break up all the rows into a multiprocessing Queue like this:

from multiprocessing import Queue

rows = [idx_and_row[1] for idx_and_row in df.iterrows()]

my_queue = Queue(maxsize = 0)

for idx, r in enumerate(rows):
    # print(idx)
    my_queue.put(r)

But when I run it, only about 37,000 things get put into my_queue and then it the program raises the following error:

    raise Full
queue.Full

What is happening and how can I fix it?

Upvotes: 2

Views: 776

Answers (2)

user1367204
user1367204

Reputation: 4797

It seems that on windows, the maximum amount of objects in a multiprocessing.Queue is infinite, but on Linux and MacOS the maximum size is 32767, which is 215 - 1, here is the significance of that number.

I solved the program by making an empty Queue object and then passing it to all the processes I wanted to pass it to, plus another process. The additional process is responsible for filling the queue with 10,000 rows, and checking it every few seconds to see if the queue has been emptied. When its empty, another 10,000 rows are added. This way, all 45,000 row is processed.

Upvotes: 1

noxdafox
noxdafox

Reputation: 15040

The multiprocessing.Queue is designed for inter-process communication. It is not intended for storing large amount of data. For that purpose, I'd suggest to use Redis or Memcached.

Usually, the queue maximum size is platform dependent, even if you set it to 0. You have no easy way to workaround that.

Upvotes: 1

Related Questions