Arvind
Arvind

Reputation: 6474

using parallel processing in google app engine for java

I have a series of repetetive jobs where for each job many different websites have to be visited- these range from 100 to 10000 sites per job .

From what I read in Google Documentation for the Task Queue API. a task can be used to send a request to an internal relative URL with some parameters (that are defined as part of the task).

What I want is to be able to control the flow, eg have one 'queue' in which only 50 sites are visited concurrently as part of one job, then one faster queue where for one job as many as 5000 sites are visited concurrently...

How do I accomplish the above in Google App Engine for Java?

The only solution that I could think of is a parallel processing framework like Korus, but that does not provide me with the level of control that is provided by Task Queues...Is there some easy and/or better way of accomplishing what I want?

Upvotes: 1

Views: 1259

Answers (1)

Jose Montes de Oca
Jose Montes de Oca

Reputation: 879

Arvind,

This can be easily accomplish just by configuring your Queues. Here is the relevant documentation on how to configure the process rate: http://code.google.com/appengine/docs/java/config/queue.html#Defining_Push_Queues_and_Processing_Rates

In summary, there are several attributes that will help you control how your application will process task on a queue. They are: rate, Bucket size, Max concurrent request. Each of them will let you limit the processing rate. Bear in mind App Engine uses a token buckets algorithm to control the rate of task execution.

For your first example, you can control that 50 sites are visited concurrently by just setting <max-concurrent-requests>50</max-concurrent-requests>

The other parameter will just help you to set how quickly you would like to process task until you get to the 50 concurrent request for that queue.

Hope this helps!

Upvotes: 2

Related Questions