pokey909
pokey909

Reputation: 1837

Special case of producer/consumer

I'm trying to synchronize a special kind of producer/consumer problem. This is the problem:

I have 2 Queues link_queue, page_queue.

Thread class ProducePages_RequireLinks (call it class A), as the name says, consumes items from the link_queueand puts an arbitrary amount (>=1) of pages from each link into page_queue.

Contrary, the main thread class ProduceLinks_RequirePages (call it class B) consumes pages from the page_queueand enqueues an arbitrary amount (>=0) of links into link_queue.

Now it is possible that class B generates links faster than class A generates pages. On the other hand, the reverse is also possible.

How do I properly synchronize those threads in Ruby 1.9.2?

I tried to use monitors in both but at some point i end up with deadlocks.

(If i failed to be precise, tell me via comments and I post some example classes)


EDIT: A picture of what is going on:

enter image description here

Examples

link_queue is initialized with 1 item page_queue is initialized with 0 items

We have 4 threads of class A and 1 thread of class B. Each line will be 1 timestep.


Threads A.1 grabs 1 link (linkQ=0) outputs 1 page (pageQ=1)

Thread B grabs 1 page (pageQ=0) outputs 400 links (linkQ=400)

Thread A.3 grabs 1 link (linkQ=399) outputs 1 page (pageQ=1)

Thread A.2 grabs 1 link (linkQ=398) outputs 1 page (pageQ=2)

Thread B grabs 1 page (pageQ=1) outputs 100 links (linkQ=498)

Thread A.1 grabs 1 link (linkQ=497) outputs 1 page (pageQ=2)

Thread A.4 grabs 1 link (linkQ=496) outputs 1 page (pageQ=3)

Thread B notices that linkQ is too large and waits until linkQ<16

. . . Threads A.* continue work . . . afterwards (linkQ=15) and (pageQ=484)

And now we have the opposite problem. Now Threads A have to wait until pageQ drops below a certain threshold. Otherwise we will run out of memory at some point.

cheers

Upvotes: 3

Views: 834

Answers (2)

Alex D
Alex D

Reputation: 30465

Whether you are using Ruby or any other language, whenever you have a producer-consumer design like you are describing here, no matter whether the producers and consumers are threads or processes, you must never assume that the consumers will be able to "keep up" with the producers. You must always use bounded queues. Even using external queues as you mentioned in your comment does not solve the problem in the general case, because while much larger than RAM, external storage is not infinite.

The Ruby standard library has SizedQueue, which you get with require 'thread'. A SizedQueue is a thread-safe queue which is limited in size. If a producer thread tries to push an item onto the queue when it is already full, the producer will block until a consumer pops an item off the queue (making space for the new one). This will give the consumers a chance to "catch up". Likewise, if a consumer thread tries to pop an item off the queue when it is empty, the consumer will block until an item is available.

If overall throughput is limited by the producers, they will tend to get more CPU time (as the consumers block). On the other hand, if the consumers are the bottleneck, they will tend to get more CPU time. This is better than allowing the producers to burn system resources filling your queue with ever-growing numbers of items, while the consumers could be using those resources to work through the backlog.

Upvotes: 2

Slartibartfast
Slartibartfast

Reputation: 8805

From what you've said, it seems you have feedback loop problem. So before I start answering synchronization part of the question, I have to ask what is the scope of your problem?

If you are building a web crawler system you've described will try to enumerate every page on the Internet, and no amount of thread synchronization will help you put that in the RAM.

How about loops? page a linking to bages b and c, page b linking to pages a and c, etc? The way you descriped your problem, each iteration will grow queue exponentially. If there is finite number of pages you want to process, how large is it? If you are iterating through some set of pages over and over, should you skip pages on the grounds of being processed recently enough?

To summarize, in order to solve this problem you have to ensure that on average one cycle produces another new cycle, by means of some links producing zero pages instead of one, and pages producing 0 links.

Or alternatively, what are you trying to do? Other approach might be more appropriate.

Upvotes: 0

Related Questions