user1179939
user1179939

Reputation: 1

Auto discover the gridnode and share some jobs to the new grid which started at the middile of the jobs execution

Initially i started three grid nodes and i am having over 200 jobs in my java program. I have shared all the jobs to the grid nodes. Once if i run the application one more node will be introduced through eclipse and it is also participating in the execution of jobs. This means one node is executing 50 jobs in parallel. When all the nodes are executing their jobs i have started another node that time and planned to share some jobs to this node which are in incomplete state.

How can we do that....

Upvotes: 0

Views: 138

Answers (2)

mojjj
mojjj

Reputation: 625

My experiences with Gridgain told me, that the job definition is very important for the processing time. When you generate jobs with a small size, the communication overhead is large and all might be slowed (you might also run into other problems with result collect cache size or timeouts). when you on the other hand chose jobs in size too large, slow nodes may block the process while other nodes are idle. finding the best size to a job is difficult.

job stealing may assist to distribute jobs better AFTER they were sent to the grid nodes. limiting the curently processed jobs to a number and enable the worker nodes to steal everything above this number of jobs in queue is done with the folloing code. this configuration is also possible with an xml file.

public class ConfigGrid {

        // config jobStealing
        public static GridConfigurationAdapter JobStealing(
                    GridConfigurationAdapter cfg, 
                    int waitJobsThreshold, 
                    int activeJobsThreshold, 
                    boolean stealingEnabled) 
        {
            GridJobStealingCollisionSpi spi = new GridJobStealingCollisionSpi();

            // Configure number of waiting jobs
            // in the queue for job stealing.
            spi.setWaitJobsThreshold(waitJobsThreshold);

            // Configure stealing attempts number.
            spi.setMaximumStealingAttempts(10);

            // Configure number of active jobs that are allowed to execute
            // in parallel. This number should usually be equal to the number
            // of threads in the pool (default is 100).
            spi.setActiveJobsThreshold(activeJobsThreshold);

            // Enable stealing.
            spi.setStealingEnabled(stealingEnabled);

            // Override default Collision SPI.
            cfg.setCollisionSpi(spi);

            return cfg;
        }   

in your main function, you call it then like:

GridConfigurationAdapter cfg = new GridConfigurationAdapter();
// config job stealing
cfg = ConfigGrid.JobStealing(cfg, numberOfJobs, setActiveJobs, stealingEnabled);
GridFactory.start(cfg);

for more settings read the documentation about GridJobStealingCollisionSpi.

(edit: of course you must use the same settings on each node)

Upvotes: 0

Nikita Ivanov
Nikita Ivanov

Reputation: 414

To migrate jobs that are in mid-execution you need to make your jobs listen for topology events and react by stopping some job (in the split) and migrating them using combination of checkpoint & custom failover SPIs.

It's more than a few lines of code and rather advance use case. I would look at making jobs shorter in duration and/or less in "size" to better utilize changing topology.

Upvotes: 1

Related Questions