windowsgm
windowsgm

Reputation: 1616

Finding Primes Using Parallel

So I created the below method to find all the primes up to a certain number. Any suggestions as to how to speed it up?

I call it like so;

interval = (value + NOOFTHREADS - 1) / NOOFTHREADS;
int max = interval * NOOFTHREADS;

tickets = new List<int>(NOOFTHREADS);
for (int i = 1; i <= NOOFTHREADS; i++)
{
    tickets.Add(i * (max / NOOFTHREADS));
}

Enumerable.Range(1, NOOFTHREADS)
.AsParallel()
.ForAll(_ => findPrimes());

With some global variables;

static List<int> vals = new List<int>();
static List<int> tickets;
static int interval = new int();

And the method;

public static void findPrimes()
    {
        int myTicket;
        lock (tickets)
        {
            myTicket = (int)tickets.Last();
            tickets.RemoveAt(tickets.Count - 1);
        }
        var max = myTicket;
        int min = max - interval +1;
        int num;

        var maxSquareRoot = Math.Sqrt(max);
        var eliminated = new System.Collections.BitArray(max + 1);
        eliminated[0] = true;
        eliminated[1] = true;
        for (int i = 2; i < (max) / 2; i++)
        {
            if (!eliminated[i])
            {
                if (i < maxSquareRoot)
                {
                    num = ((min + i -1 )/i)*i;

                    if (num == i)
                        num = num + i;

                    for (int j =num; j <= max; j += i)
                        eliminated[j] = true;
                }
            }
        }
        for (int b = (int)min; b < max; b++)
        {
            if (!eliminated[b])
                lock(vals)
                    vals.Add(b);
        }
    }

Upvotes: 3

Views: 3018

Answers (4)

Daniel Fischer
Daniel Fischer

Reputation: 183968

The Sieve of Eratosthenes can be parallelised rather easily, you just have to split it into separate chunks and sieve each chunk on its own. You have started with that splitting, but haven't gone far enough to obtain good results. Let's see what problems there are in findPrimes()

var max = myTicket;
int min = max - interval +1;
int num;

var maxSquareRoot = Math.Sqrt(max);
var eliminated = new System.Collections.BitArray(max + 1);

You create a new BitArray for each thread covering all numbers from 0 to max. For the thread sieving the first chunk, that's fine, but for later threads, you're allocating much more memory than needed. With a high upper bound and many threads, that's a problem in itself, you're allocating roughly (NOOFTHREADS + 1) * limit / 2 bits where only about limit bits are needed. For fewer threads and/or lower limits, you're still deteriorating locality and will have more cache misses.

eliminated[0] = true;
eliminated[1] = true;
for (int i = 2; i < (max) / 2; i++)

You should stop the outer loop when i > maxSquareRoot. Then the loop body doesn't do anything productive anymore, it just performs one read and one or two checks. That doesn't take long for each iteration, but doing that for all i from √max to max adds up if max is e.g. 1011. Doing that for the last chunk alone can take longer than a single-threaded single-chunk sieve.

{
    if (!eliminated[i])

eliminated[i] can only ever be true for i >= min (or i < 2), a situation you will only encounter in the first chunk for i <= maxSquareRoot (unless the limit is ridiculously low). So for the other chunks, you're eliminating also the multiples of 4, 6, 8, 9, 10, 12, 14, ... . A lot of wasted work.

    {
        if (i < maxSquareRoot)

In the case that maxSquareRoot happens to be a prime, you are not eliminating its square, the comparison ought to be <=.

        {
            num = ((min + i -1 )/i)*i;

            if (num == i)
                num = num + i;

            for (int j =num; j <= max; j += i)
                eliminated[j] = true;
        }
    }
}

Now, when the sieving is done, you step through the chunk of the BitArray

for (int b = (int)min; b < max; b++)
{
    if (!eliminated[b])
        lock(vals)
            vals.Add(b);
}

and whenever you find a prime, you lock the list vals and add the prime to it. If there are ever two or more threads done with the sieving at about the same time, they will step on each others toes there, and the locking and waiting will slow down the process further.

To reduce space usage, each thread should create a list of the primes to maxSquareRoot, and use that to eliminate the composites in its chunk, so that the BitArray needs only max - min + 1 bits. Each thread creating its own list duplicates some work, but since the upper limit here is small, it's not much additional work. I don't know how concurrent read-access is handled, if that doesn't add synchronisation overhead, you could also work with only one list for all threads, but I doubt that would gain anything.

The code would look roughly thus:

List<int> sievePrimes = simpleSieve(maxSquareRoot);
// simpleSieve is a standard SoE returning a list of primes not exceeding its argument
var sieve = new System.Collections.BitArray(max - min + 1);
int minSquareRoot = (int)Math.Sqrt(min);
foreach(int p in sievePrimes)
{
    int num = p > minSquareRoot ? p*p : ((min + p - 1)/p)*p;
    num -= min;
    for(; num <= max-min; num += p)
    {
        sieve[num] =true;
    }
}

Now, to avoid the threads stepping on each others toes when adding the primes to the list, each thread should create its own list of primes and append that in one step (I'm not 100% sure that that's faster than adding each prime with its own lock, but I'd be surprised if not)

List<int> primes = new List<int>();
for(int offset = 0; offset <= max-min; ++offset)
{
    if (!sieve[offset])
    {
        primes.Add(min + offset);
    }
}
lock(vals) vals.AddRange(primes);

(and vals should be created with an initial capacity of about the expected number of primes to avoid reallocation for each chunk)

Upvotes: 5

Onkelborg
Onkelborg

Reputation: 4007

I think all your locks might be a problem, you should try to avoid locks, they are quite expensive. I don't know the details of your algorithm, but I think you should try to remove the locks somehow. The tickets could be input? They could have their own output queue that you merge when all has completed?

Upvotes: 0

Matthew Layton
Matthew Layton

Reputation: 42330

There is already a lot of documentation regarding performance of parallel vs serial code. Microsoft recommends that you read their documentation on using Parallel functionality before converting code to use parallel architecture.

Consider reading the following Q/A I have posted on here:

Nested Parallel.ForEach loops

Performance profiling in .NET

Upvotes: 0

Is there a reason for implementing that textbook math-readable-square-root algorithm? Have a look at Prime Number Algorithm

Upvotes: 0

Related Questions