Georg M. Sorst
Georg M. Sorst

Reputation: 264

How can I get the list of scheduled jobs from Gearman?

I am currently evalutuating Gearman to farm out some expensive data import jobs in our backend. So far this looks very promising. However there is one piece missing that I just can't seem to find any info about. How can I get a list of schedules jobs from Gearman?

I realize I can use the admin protocol to get the number of currently queued jobs for each function, but I need info about the actual jobs. There is also the option of using a persistent queue (eg. MySQL) and query the database for the jobs, but it feels pretty wrong to me to circumvent Gearman for this kind of information. Other than that, I'm out of ideas.

Probably I don't need this at all :) So here's some more background on what I want to do, I'm all open for better suggestions. Both the client and the worker run in PHP. In our admin interface the admins can trigger a new import for a client; as the import takes a while it is started as a background task. Now the simple questions I want to be able to answer: When was the last import run for this client? Is an import already queued for this client (in that case triggering a new import should have no effect)? Nice to have: At which position in the queue is this job (so I can make an estimate on when it will run)?

Thanks!

Upvotes: 4

Views: 3899

Answers (2)

paul.ago
paul.ago

Reputation: 4103

You have pretty much given yourself the answer: use a DBRMS (MySQL or Postgres) as persistance backend and query the gearman_queue table.

For instance, we developed a hybrid solution: we generate and pass an unique id for the job which we pass as third parameter to doBackground() (http://php.net/manual/en/gearmanclient.dobackground.php) when queuing the job.

Then we use this id to query the gearman table to verify the job status looking at the 'unique_key' table field. You can also get the queue position as the record are already ordered.

Pro Bonus: we also catch exceptions inside the worker. If a job fails we write the job payload (which is a JSON serialized object) on a file, and then pick up the file and requeue the job via cronjob incrementing the 'retry' internal counter so we retry a single job 3 times max, and get to inspect the job later if it still fails.

Upvotes: 1

MatsLindh
MatsLindh

Reputation: 52832

The Admin protocol is what you'd usually use, but as you've discovered, it won't list the actual tasks in the queue. We've solved this by keeping track of the current tasks we've started in our application layer, and having a callback in our worker telling the application when the task has finished. This allows us to perform cleanup, notification etc. when the task has finished, and allows us to keep this logic in the application and not the worker itself.

Relating to progress the best way is to just use the built-in progress mechanics in Gearman itself, in the PHP module you can call this by using $job->sendStatus(percentDone, 100). A client can then retrieve this value from the server using the task handle (which will be returned when you start the job). That'll allow you to show the current progress to users in your interface.

As long as you have the current running tasks in your application, you can use that to answer wether there are similar tasks already running, but you can also use gearman's built-in job coalescing / de-duplication; see the $unique parameter when adding the task.

The position in the current queue will not be available through Gearman, so you'll have to do this in your application as well. I'd stay away from asking the Gearman persistence layer for this information.

Upvotes: 3

Related Questions