Reputation: 4106

Thread Pool per key in Java

Suppose that you have a grid G of n x m cells, where n and m are huge. Further, suppose that we have numerous tasks, where each task belong to a single cell in G, and should be executed in parallel (in a thread pool or other resource pool).

However, task belonging to the same cell must be done serially, that is, it should wait that previous task in the same cell to be done.

How can I solve this issue? I've search and used several thread pools (Executors, Thread), but no luck.

Minimum Working Example

import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class MWE {

    public static void main(String[] args) {
        ExecutorService threadPool = Executors.newFixedThreadPool(16);
        Random r = new Random();

        for (int i = 0; i < 10000; i++) {
            int nx = r.nextInt(10);
            int ny = r.nextInt(10);

            Runnable task = new Runnable() { 
                public void run() { 
                  try {
                    System.out.println("Task is running"); 
                    Thread.sleep(1000);
                  } catch (InterruptedException e) {
                    e.printStackTrace();
                  }
                } 
            };

            threadPool.submit(new Thread(task)); // Should use nx,ny here somehow
        }
    }

}

Upvotes: 2

Answers (6)

Prabhu P

Reputation: 219

The Scala example given below demonstrates how keys in a map can be executed in parallel and values of a key are executed in serial. Change it to Java syntax if you want to try it in Java (Scala uses JVM libraries). Basically chain the tasks future to have them execute sequentially.

import java.util.concurrent.{CompletableFuture, ExecutorService, Executors, Future, TimeUnit}
import scala.collection.concurrent.TrieMap
import scala.collection.mutable.ListBuffer
import scala.util.Random

/**
 * For a given Key-Value pair with tasks as values, demonstrates sequential execution of tasks
 * within a key and parallel execution across keys.
 */
object AsyncThreads {

  val cachedPool: ExecutorService = Executors.newCachedThreadPool
  var initialData: Map[String, ListBuffer[Int]] = Map()
  var processedData: TrieMap[String, ListBuffer[Int]] = TrieMap()
  var runningTasks: TrieMap[String, CompletableFuture[Void]] = TrieMap()

  /**
   * synchronous execution across keys and values
   */
  def processSync(key: String, value: Int, initialSleep: Long) = {
    Thread.sleep(initialSleep)
    if (key.equals("key_0")) {
      println(s"${Thread.currentThread().getName} -> sleep: $initialSleep. Inserting key_0 -> $value")
    }
    processedData.getOrElseUpdate(key, new ListBuffer[Int]).addOne(value)
  }

  /**
   * parallel execution across keys
   */
  def processASync(key: String, value: Int, initialSleep: Long) = {
    val task: Runnable = () => {
      processSync(key, value, initialSleep)
    }

    // 1. Chain the futures for sequential execution within a key
    val prevFuture = runningTasks.getOrElseUpdate(key, CompletableFuture.completedFuture(null))
    runningTasks.put(key, prevFuture.thenRunAsync(task, cachedPool))

    // 2. Parallel execution across keys and values
    // cachedPool.submit(task)
  }

  def process(key: String, value: Int, initialSleep: Int): Unit = {
    //processSync(key, value, initialSleep) // synchronous execution across keys and values
    processASync(key, value, initialSleep) // parallel execution across keys
  }

  def main(args: Array[String]): Unit = {

    checkDiff()

    0.to(9).map(kIndex => {
      var key = "key_" + kIndex
      var values = ListBuffer[Int]()
      initialData += (key -> values)
      1.to(10).map(vIndex => {
        values += kIndex * 10 + vIndex
      })
    })

    println(s"before data:$initialData")

    initialData.foreach(entry => {
      entry._2.foreach(value => {
        process(entry._1, value, Random.between(0, 100))
      })
    })

    cachedPool.awaitTermination(5, TimeUnit.SECONDS)
    println(s"after data:$processedData")

    println("diff: " + (initialData.toSet diff processedData.toSet).toMap)
    cachedPool.shutdown()
  }

  def checkDiff(): Unit = {
    var a1: TrieMap[String, List[Int]] = new TrieMap()
    a1.put("one", List(1, 2, 3, 4, 5))
    a1.put("two", List(11, 12, 13, 14, 15))

    var a2: TrieMap[String, List[Int]] = new TrieMap()
    a2.put("one", List(2, 1, 3, 4, 5))
    a2.put("two", List(11, 12, 13, 14, 15))


    println("a1: " + a1)
    println("a2: " + a2)

    println("check.diff: " + (a1.toSet diff a2.toSet).toMap)
  }
}

Upvotes: 0

user13498808

Reputation:

This library should do the job: https://github.com/jano7/executor

int maxTasks = 16;
ExecutorService threadPool = Executors.newFixedThreadPool(maxTasks);
KeySequentialBoundedExecutor executor = new KeySequentialBoundedExecutor(maxTasks, threadPool);

Random r = new Random();

for (int i = 0; i < 10000; i++) {
    int nx = r.nextInt(10);
    int ny = r.nextInt(10);

    Runnable task = new Runnable() {

    public void run() { 
        try {
            System.out.println("Task is running"); 
            Thread.sleep(1000);
        } catch (InterruptedException e) {
                e.printStackTrace();
            }
        } 
    };

    executor.execute(new KeyRunnable<>((ny * 10) + nx, task));
}

Upvotes: 0

JohnRossy

Reputation: 73

You can create a list of n Executors.newFixedThreadPool(1). Then submit to the corresponding thread by using a hash function. Ex. threadPool[key%n].submit(new Thread(task)).

Upvotes: 2

vanOekel

Reputation: 6538

A callback mechanism with a synchronized block could work efficiently here. I have previously answered a similar question here. There are some limitations (see the linked answer), but it is simple enough to keep track of what is going on (good maintainability). I have adapted the source code and made it more efficient for your case where most tasks will be executed in parallel (since n and m are huge), but on occasion must be serial (when a task is for the same point in the grid G).

import java.util.*;
import java.util.concurrent.*;
import java.util.concurrent.locks.ReentrantLock;

// Adapted from https://stackoverflow.com/a/33113200/3080094
public class GridTaskExecutor {

    public static void main(String[] args) {

        final int maxTasks = 10_000;
        final CountDownLatch tasksDone = new CountDownLatch(maxTasks);
        ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(16);
        try {
            GridTaskExecutor gte = new GridTaskExecutor(executor); 
            Random r = new Random();

            for (int i = 0; i < maxTasks; i++) {

                final int nx = r.nextInt(10);
                final int ny = r.nextInt(10);

                Runnable task = new Runnable() { 
                    public void run() { 
                        try {
                            // System.out.println("Task " + nx + " / " + ny + " is running");
                            Thread.sleep(1);
                        } catch (Exception e) {
                            e.printStackTrace();
                        } finally {
                            tasksDone.countDown();
                        }
                    } 
                };
                gte.addTask(task, nx, ny);
            }
            tasksDone.await();
            System.out.println("All tasks done, task points remaining: " + gte.size());
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            executor.shutdownNow();
        }
    }

    private final Executor executor;
    private final Map<Long, List<CallbackPointTask>> tasksWaiting = new HashMap<>();
    // make lock fair so that adding and removing tasks is balanced.
    private final ReentrantLock lock = new ReentrantLock(true);

    public GridTaskExecutor(Executor executor) {
        this.executor = executor;
    }

    public void addTask(Runnable r, int x, int y) {

        Long point = toPoint(x, y);
        CallbackPointTask pr = new CallbackPointTask(point, r);
        boolean runNow = false;
        lock.lock();
        try {
            List<CallbackPointTask> pointTasks = tasksWaiting.get(point);
            if (pointTasks == null) {
                if (tasksWaiting.containsKey(point)) {
                    pointTasks = new LinkedList<CallbackPointTask>();
                    pointTasks.add(pr);
                    tasksWaiting.put(point, pointTasks);
                } else {
                    tasksWaiting.put(point, null);
                    runNow = true;
                }
            } else {
                pointTasks.add(pr);
            }
        } finally {
            lock.unlock();
        }
        if (runNow) {
            executor.execute(pr);
        }
    }

    private void taskCompleted(Long point) {

        lock.lock();
        try {
            List<CallbackPointTask> pointTasks = tasksWaiting.get(point);
            if (pointTasks == null || pointTasks.isEmpty()) {
                tasksWaiting.remove(point);
            } else {
                System.out.println(Arrays.toString(fromPoint(point)) + " executing task " + pointTasks.size());
                executor.execute(pointTasks.remove(0));
            }
        } finally {
            lock.unlock();
        }
    }

    // for a general callback-task, see https://stackoverflow.com/a/826283/3080094
    private class CallbackPointTask implements Runnable {

        final Long point;
        final Runnable original;

        CallbackPointTask(Long point, Runnable original) {
            this.point = point;
            this.original = original;
        }

        @Override
        public void run() {

            try {
                original.run();
            } finally {
                taskCompleted(point);
            }
        }
    }

    /** Amount of points with tasks. */ 
    public int size() {

        int l = 0;
        lock.lock();
        try {
            l = tasksWaiting.size(); 
        } finally {
            lock.unlock();
        }
        return l;
    }

    // https://stackoverflow.com/a/12772968/3080094
    public static long toPoint(int x, int y) {
        return (((long)x) << 32) | (y & 0xffffffffL);
    }

    public static int[] fromPoint(long p) {
        return new int[] {(int)(p >> 32), (int)p };
    }

}

Upvotes: 1

tsayen

Reputation: 3055

If I get you right, you want to execute X tasks (X is very big) in Y queues (Y is much smaller than X).
Java 8 has CompletableFuture class, which represents an (asynchronous) computation. Basically, it's Java's implementation of Promise. Here is how you can organize a chain of computations (generic types omitted):

// start the queue with a "completed" task
CompletableFuture queue = CompletableFuture.completedFuture(null);  
// append a first task to the queue
queue = queue.thenRunAsync(() -> System.out.println("first task running"));  
// append a second task to the queue
queue = queue.thenRunAsync(() -> System.out.println("second task running"));
// ... and so on

When you use thenRunAsync(Runnable), tasks will be executed using a thread pool (there are other possibilites - see API docs). You can also supply your own thread pool as well. You can create Y of such chains (possibly keeping references to them in some table).

Upvotes: 1

JVXR

Reputation: 1312

This is were systems like Akka in java world make sense.If both X and Y are large, you may want to look at processing them using a message passing mechanism rather than bunch them up in a huge chain of callbacks and futures. One actor has the list of tasks to be done and is handed a cell and the actor would eventually compute the result and persist it. If something fails in the intermediate step, it's not end of world.

Upvotes: 1

Thread Pool per key in Java

Answers (6)

Related Questions