Jason S
Jason S

Reputation: 189906

data structures for scheduling workflow?

I'm wondering what kind(s) of data structures / algorithms might help facilitate handling the following situation; I'm not sure if I need a single FIFO, or a priority queue, or multiple FIFOs.

I have N objects that must proceed through a predefined workflow. Each object must complete step 1, then step 2, then step 3, then step 4, etc. Each step is either done quickly or involves a "wait" that depends on something external to finish (like the completion of a file operation or whatever). Each object maintains its own state. If I had to define an interface for these objects, it would be something like this (written below in pseudo-Java, but this question is language-agnostic):

public interface TaskObject
{
   public enum State { READY, WAITING, DONE };
   // READY = ready to execute next step
   // WAITING = awaiting some external condition
   // DONE = finished all steps

   public int getCurrentStep();
   // returns # of current step

   public int getEndStep();
   // returns # of step which is the DONE case.

   public State getState();
   // checks state and returns it. 
   // multiple calls will always be identical, 
   // except WAITING which can transition to READY or DONE.

   public State executeStep();
   // if READY, executes next step and returns getState().
   // otherwise, returns getState().

}

I need to write a single-threaded scheduler that calls executeStep() on the "next" object. My problem is, I'm not sure exactly what technique I should use to determine what the "next" object is. I want it to be fair (first-come, first-serve for objects not in the WAITING state).

My gut call is to have 3 FIFOs, READY, WAITING and DONE. In the beginning all objects are placed in the READY queue, and the scheduler repeats a loop where it takes the first object off the READY queue, calls executeStep(), and places it onto the queue that's appropriate the the result of executeStep(). Except that items in the WAITING queue need to be put into the READY or DONE queue when their state changes.... argh!

Any advice?

Upvotes: 0

Views: 8350

Answers (6)

MadH
MadH

Reputation: 1508

Take a look a this link.

Boost state machines vs uml

Boost has state machines. Why reinvent?

Upvotes: 0

florin
florin

Reputation: 14346

You might want to study the design of an operating system scheduler. Check out the Linux and *BSD for example.

Some pointers for the Linux scheduler: Inside the Linux scheduler and Understanding the Linux Kernel

Upvotes: 1

A. I. Breveleri
A. I. Breveleri

Reputation: 335

The simplest technique that satisfies the requirements in your question is to repeatedly iterate over all TaskObjects calling executeStep() on each one.

This requires only one construct to hold the TaskObjects, and it can be any iterable structure, e.g. an array.

Since a TaskObject can transition from WAITING to READY asynchronously, you have to poll every TaskObject that you don't know is DONE.

The performance gained from not polling the DONE TaskObjects may be negligible. It depends on the processing load of calling executeStep() on a DONE TaskObject, which should be small.

A simple round-robin polling assures that once a READY TaskObject has executed a step, it will not execute another step until all other TaskObjects have had a chance to execute.

One obvious additional requirement is detecting when all TaskObjects are in the DONE state so you can stop processing.

To avoid polling DONE TaskObjects you will need to either maintain a flag for each one, or chain the TaskObjects in two queues: READY/WAITING and DONE.

If you store the TaskObjects in an array, make it an array of records, with members DoneFlag and TaskObject.

If for some reason you are storing the TaskObjects in a queue, with available enqueue() and dequeue() methods, then the overhead of two queues instead of one may be small.

-Al.

Upvotes: 0

Alex
Alex

Reputation: 242

If this has to be single threaded you can use a single FIFO queue for the ready and waiting objects and use your thread to process each object as it comes out. If it's state changes to WAITING then simply stick it back into the queue and it will be reprocessed.

Something like (psuedocode):

var item = queue.getNextItem();
var state = item.executeStep ();
if (state == WAITING)
    queue.AddItem (item);
else if (state == DONE)
    // add to collection of done objects

Depending on the time executeStep takes to run you may need to introduce a delay (Sleep not for) to prevent a tight polling loop. Ideally you would have the objects publish state change events and do-away with the polling altogether.

This is the kind of timeslicing approach that was commonplace in hardware and comms software before multithreading was widespread.

Upvotes: 1

Greg Rogers
Greg Rogers

Reputation: 36459

You don't have any way for the task object to notify you when it changes from WAITING to READY except polling it, so the WAITING and READY queues could really just be one. You can just loop around it calling executeStep() on each one in turn. If as a return value from executeStep() you receive DONE, then you remove it from that queue and stick it on the DONE queue and forget about it.

If you wanted to give "more priority" towards READY objects and attempt to run through all possible READY objects before wasting any resources polling WAITING you can maintain 3 queues like you said and only process the WAITING queue when you have nothing in the READY queue.

I personally would spend some effort to eliminate the polling of the state, and instead define an interface that the object could use to notify your scheduler when a state changes.

Upvotes: 1

Tim
Tim

Reputation: 20360

NOTE - this does not address your question of how to schedule, but I would use a separate state class that defines the states and transitions. The objects should not know what states they should go through. They can be informed of what "Step" they are at, etc.

there are some patterns for that as well.

You should read up a little on operating systems - specifically the scheduler. Your example is a scaled down set of that problem and if you copy the relevant parts it should work great for you.

You can then add priority, etc.

Upvotes: 0

Related Questions