How to calculate overall computation time for a multi-threaded process

Question

I have a set of tasks, let's call it T[], where each task T[i] needs a certain amount of time t(T[i]) to be processed. The tasks are being processed in parallel by X threads (that is not to mean that multiple threads are co-working on a single task, but that multiple tasks are being processed by multiple threads, each thread doing one task, then the next, etc).

Now I want to calculate the expected overall time it will take to process all tasks. It's easy as long as size(T[]) <= X of course (i.e. the number of tasks is less than or equal to the number of threads), in this case the overall time equals the time of the slowest task.

But I'm quite lost for the case X < size(T[]) (i.e. I have fewer threads than tasks). How would one calculate that in an elegant way?

edit: As asked by a commentator, we can assume the tasks queue is ordered by longest-running task first, shortest-running task last. Also, we can assume there is no pauses between tasks, and we can also neglect what the OS scheduler is doing.

Paul Hankin · Accepted Answer

I assume that the tasks are scheduled in the order that they're provided, and that each task goes to the first thread that's free. There's no meaningful non-determinism if these assumptions are correct -- a task may go to any of the threads that are free (if there's more than one), but this has no effect on the total running time.

In that case, we can simulate this using a min-heap of size X (where X is the number of threads), with the values in the heap representing the free time of one of the threads. For each task, we pop the earliest-free thread off the heap, and then push it back with the time it'll finish this new task.

After we've scheduled all tasks, we can take the largest value in the heap, which will be the time at which all tasks are completed.

This is relatively little code in Python:

import heapq

def compute_time(tasks, X):
    threads = [0] * X
    for t in tasks:
        heapq.heappush(threads, heapq.heappop(threads) + t)
    return max(threads)

print compute_time([3, 2, 1], 2)
print compute_time([5, 4, 3, 3, 2, 1, 1], 3)

Or in Java:

import java.util.*;

class Threads {
    public static void main(String args[]) {
        int totalTime1 = computeTotalTime(Arrays.asList(3, 2, 1), 2);
        System.out.println("totalTime1: " + totalTime1);

        int totalTime2 = computeTotalTime(Arrays.asList(5, 4, 3, 3, 2, 1, 1), 3);
        System.out.println("totalTime2: " + totalTime2);
    }

    static int computeTotalTime(List task, int threads) {
        PriorityQueue q = new PriorityQueue();
        for (int i = 0; i < threads; i++) q.add(0);
        for (int t : task) q.add(q.poll() + t);
        int max = 0;
        while(!q.isEmpty()) max = q.poll();
        return max;
    }
}

How to calculate overall computation time for a multi-threaded process

Answers (2)

Related Questions