Duncan Palmer
Duncan Palmer

Reputation: 2913

Multithreading only .4 of a second faster?

so for my programming class we have to do the following:

We have to measure the time it takes to count the occurences for both single threaded, and multi-threaded. Currently I average 9.3ms for single threaded, and 8.9 ms multithreaded with 8 threads on my 8 core cpu, why is this?

Currently for multithreading I have one array filled with numbers and am calculating lower and upper bounds for each thread to count occurences. here is my current attempt:

public void createThreads(int divisionSize) throws InterruptedException {

    threads = new Thread[threadCount];

    for(int i = 0; i < threads.length; i++) {

        final int lower = (i*divisionSize);
        final int upper = lower + divisionSize - 1;

        threads[i] = new Thread(new Runnable() {

            long start, end;
            @Override
            public void run() {


                start = System.nanoTime();

                for(int i = lower; i <= upper; i++) {
                    occurences[numbers[i]]++;
                }

                end = System.nanoTime();

                milliseconds += (end-start)/1000000.0;
            }

        });

        threads[i].start();
        threads[i].join();
    }

}

Could anyone shed some light? Cheers.

Upvotes: 1

Views: 137

Answers (4)

OldCurmudgeon
OldCurmudgeon

Reputation: 65811

You are essentially doing all the work sequentially because each thread you create you immediately join it.

Move the threads[i].join() outside the main construction loop into it's own loop. While you're at it you should probably also start all of the threads outside of the loop as starting them while new threads are still being created is not a good idea because creating threads takes time.

class ThreadTester {

    private final int threadCount;
    private final int numberCount;
    int[] numbers = new int[5_000_000];
    AtomicIntegerArray occurences;
    Thread[] threads;
    AtomicLong milliseconds = new AtomicLong();

    public ThreadTester(int threadCount, int numberCount) {
        this.threadCount = threadCount;
        this.numberCount = numberCount;
        occurences = new AtomicIntegerArray(numberCount);
        threads = new Thread[threadCount];
        Random r = new Random();
        for (int i = 0; i < numbers.length; i++) {
            numbers[i] = r.nextInt(numberCount);
        }
    }

    public void createThreads() throws InterruptedException {

        final int divisionSize = numbers.length / threadCount;
        for (int i = 0; i < threads.length; i++) {

            final int lower = (i * divisionSize);
            final int upper = lower + divisionSize - 1;

            threads[i] = new Thread(new Runnable() {

                @Override
                public void run() {

                    long start = System.nanoTime();

                    for (int i = lower; i <= upper; i++) {
                        occurences.addAndGet(numbers[i], 1);
                    }

                    long end = System.nanoTime();

                    milliseconds.addAndGet(end - start);
                }

            });

        }

    }

    private void startThreads() {
        for (Thread thread : threads) {
            thread.start();
        }
    }

    private void finishThreads() throws InterruptedException {
        for (Thread thread : threads) {
            thread.join();
        }
    }

    public long test() throws InterruptedException {
        createThreads();
        startThreads();
        finishThreads();
        return milliseconds.get();
    }
}

public void test() throws InterruptedException {
    for (int threads = 1; threads < 50; threads++) {
        ThreadTester tester = new ThreadTester(threads, 10);
        System.out.println("Threads=" + threads + " ns=" + tester.test());
    }
}

Note that even here the fastest solution is using one thread but you can clearly see that an even number of threads does it quicker as I am using an i5 which has 2 cores but works as 4 via hyperthreading.

with contention

Interestingly though - as suggested by @biziclop - removing all contention between threads via the occurrences by giving each thread its own `occurrences array we get a more expected result:

without contention

Upvotes: 4

LoganMzz
LoganMzz

Reputation: 1623

Use an ExecutorService with Callable and invoke all tasks then you can safely aggregate them. Also use TimeUnit for elapsing time manipulations (sleep, joining, waiting, convertion, ...)

Start by defining the task with his input/output :

class Task implements Callable<Task> {

  // input
  int[] source;
  int   sliceStart;
  int   sliceEnd;

  // output
  int[]  occurences = new int[10];
  String runner;
  long   elapsed = 0;

  Task(int[] source, int sliceStart, int sliceEnd) {
    this.source     = source;
    this.sliceStart = sliceStart;
    this.sliceEnd   = sliceEnd;
  }

  @Override
  public Task call() {
    runner = Thread.currentThread().getName();
    long start = System.nanoTime();
    try {
      compute();
    } finally {
      elapsed = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
    }
    return this;
  }

  void compute() {
    for (int i = sliceStart; i < sliceEnd; i++) {
      occurences[source[i]]++;
    }
  }
}

Then let's define some variable to manage parameters:

// Parametters
int size     = 5_000_000;
int parallel = Runtime.getRuntime().availableProcessors();
int slices   = parallel;

Then generates random input:

// Generated source
int[] source = new int[size];
ThreadLocalRandom random = ThreadLocalRandom.current();
for (int i = 0; i < source.length; i++)  source[i] = random.nextInt(10);

Start timing total computation and prepare tasks:

long start = System.nanoTime();
// Prepare tasks
List<Task> tasks = new ArrayList<>(slices);
int sliceSize = source.length / slices;
for (int sliceStart = 0; sliceStart < source.length;) {
  int sliceEnd = Math.min(sliceStart + sliceSize, source.length);
  Task task = new Task(source, sliceStart, sliceEnd);
  tasks.add(task);
  sliceStart = sliceEnd;
}

Executes all task on threading configuration (don't forget to shutdown it !):

// Execute tasks
ExecutorService executor = Executors.newFixedThreadPool(parallel);
try {
  executor.invokeAll(tasks);
} finally {
  executor.shutdown();
}

Then task have been completed, just aggregate data:

// Collect data
int[] occurences = new int[10];
for (Task task : tasks) {
  for (int i = 0; i < occurences.length; i++) {
    occurences[i] += task.occurences[i];
  }
}

Finally you can output computation result:

// Display result
long elapsed = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
System.out.printf("Computation done in %tT.%<tL%n", calendar(elapsed));
System.out.printf("Results: %s%n", Arrays.toString(occurences));

You can also output partial computations:

// Print debug output
int idxSize = (String.valueOf(size).length() * 4) / 3;
String template = "Slice[%," + idxSize + "d-%," + idxSize + "d] computed in %tT.%<tL by %s: %s%n";
for (Task task : tasks) {
  System.out.printf(template, task.sliceStart, task.sliceEnd, calendar(task.elapsed), task.runner, Arrays.toString(task.occurences));
}

Which gives on my workstation:

Computation done in 00:00:00.024
Results: [500159, 500875, 500617, 499785, 500017, 500777, 498394, 498614, 499498, 501264]
Slice[        0-1 250 000] computed in 00:00:00.013 by pool-1-thread-1: [125339, 125580, 125338, 124888, 124751, 124608, 124463, 124351, 125023, 125659]
Slice[1 250 000-2 500 000] computed in 00:00:00.014 by pool-1-thread-2: [124766, 125423, 125111, 124756, 125201, 125695, 124266, 124405, 125083, 125294]
Slice[2 500 000-3 750 000] computed in 00:00:00.013 by pool-1-thread-3: [124903, 124756, 124934, 125640, 124954, 125452, 124556, 124816, 124737, 125252]
Slice[3 750 000-5 000 000] computed in 00:00:00.014 by pool-1-thread-4: [125151, 125116, 125234, 124501, 125111, 125022, 125109, 125042, 124655, 125059]

the small trick to convert elapsed millis in a stopwatch calendar:

static final TimeZone UTC= TimeZone.getTimeZone("UTC");
public static Calendar calendar(long millis) {
  Calendar calendar = Calendar.getInstance(UTC);
  calendar.setTimeInMillis(millis);
  return calendar;
}

Upvotes: 1

Liviu Stirb
Liviu Stirb

Reputation: 6075

The join method allows one thread to wait for the completion of another, so the second thread will start only after the first will finish.

Join each thread after you started all threads.

  public void createThreads(int divisionSize) throws InterruptedException {

        threads = new Thread[threadCount];

        for(int i = 0; i < threads.length; i++) {

            final int lower = (i*divisionSize);
            final int upper = lower + divisionSize - 1;

            threads[i] = new Thread(new Runnable() {

                long start, end;
                @Override
                public void run() {


                    start = System.nanoTime();

                    for(int i = lower; i <= upper; i++) {
                        occurences[numbers[i]]++;
                    }

                    end = System.nanoTime();

                    milliseconds += (end-start)/1000000.0;
                }

            });

            threads[i].start();

        }

        for(int i = 0; i < threads.length; i++) {
            threads[i].join();
        }

    }

Also there seem to be a race condition in code at occurences[numbers[i]]++ So most probably if you update the code and use more threads the output wouldn't be correct. You should use an AtomicIntegerArray: https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/AtomicIntegerArray.html

Upvotes: 1

biziclop
biziclop

Reputation: 49744

The other answers all explored the immediate problems with your code, I'll give you a different angle: one that's about design of multi-threading in general.

The idea of parallel computing speeding up calculations depends on the assumption that the small bits you broke the problem up into can indeed be run in parallel, independently of each other.

And at first glance, your problem is exactly like that, chop the input range up into 8 equal parts, fire up 8 threads and off they go.

There is a catch though:

occurences[numbers[i]]++;

The occurences array is a resource shared by all threads, and therefore you must control access to it to ensure correctness: either by explicit synchronization (which is slow) or something like an AtomicIntegerArray. But the Atomic* classes are only really fast if access to them is rarely contested. And in your case access will be contested a lot, because most of what your inner loop does is incrementing the number of occurrences.

So what can you do?

The problem is caused partly by the fact that occurences is such a small structure (an array with 10 elements only, regardless of input size), threads will continuously try to update the same element. But you can turn that to your advantage: make all the threads keep their own separate tally, and when they all finished, just add up their results. This will add a small, constant overhead to the end of the process but will make the calculations go truly parallel.

Upvotes: 2

Related Questions