Chris
Chris

Reputation: 22257

Help speed up this algorithm? Sieve of Eratosthenes

I've written an algorithm that I believe to be correct for computing prime numbers up to n with the Sieve of Eratosthenes. Unfortunately, this program hangs on really large values of n (try 10 million). Here is what I've written...

Protected Function Eratosthenes(ByVal n As Integer) As String
    Dim maxValue As Integer = Math.Sqrt(n)
    Dim values As Generic.List(Of Integer) = New Generic.List(Of Integer)
    Dim i As Integer
    ''//create list.
    For i = 2 To n
        values.Add(i)
    Next

    For i = 2 To maxValue
        If values.Contains(i) Then
            Dim k As Integer
            For k = i + 1 To n
                If values.Contains(k) Then
                    If (k Mod i) = 0 Then
                        values.Remove(k)
                    End If
                End If
            Next
        End If
    Next

    Dim result As String = ""
    For i = 0 To values.Count - 1
        result = result & " " & values(i)
    Next

    Return result
End Function

How might I speed this algorithm up? Where are my bottlenecks?

Upvotes: 4

Views: 2283

Answers (12)

xpda
xpda

Reputation: 15813

This will do n=10,000,000 in .28 seconds, and n=100,000,000 in about 4 seconds. (VB 2008, Pentium e8500)

The slowest part was the string concatenation. VB is really slow concatenating large strings, so I used an integer array to save the primes.

Instead of doing a mod for each value, you can multiply and you only have to consider a fraction of the values.

VB did not optimize int(i/2) in the "for k" statement, so I moved it to the variable max2.

For large n, redimensioning the answer array became a bottleneck, so I added 100,000 elements each time it needed expanding.

As usual... I didn't check this thoroughly, so there may be bugs.

Protected Function Eratosthenes(ByVal n As Integer) As Integer()

Dim nPrime As Integer
Dim values(n) As Byte
Dim ans(1) As Integer

Dim maxValue As Integer = Math.Sqrt(n)
For i = 0 To n
    values(i) = 1
Next i

Dim max2 As Integer
For i = 2 To maxValue
    If values(i) <> 0 Then
        max2 = Int(n / i)
        For k = 2 To max2
            values(k * i) = 0
        Next k
    End If
Next i

nPrime = 0
For i = 2 To UBound(values)
    If values(i) <> 0 Then
      nPrime = nPrime + 1
      If nPrime > UBound(ans) Then ReDim Preserve ans(nPrime + 100000)
      ans(nPrime) = i
    End If
Next i

ReDim Preserve ans(nPrime)
Eratosthenes = ans

End Function

Upvotes: 0

Jaime
Jaime

Reputation: 67457

I think you are missing the key point of what makes sieving a great algorithmic tool...

The greatness of sieving is that it allows you to avoid doing the costly modulo operation: if you know p is a prime, to eliminate all of its multiples, rather than checking if all numbers in the sieve are divisible by p, you eliminate items 2p, 3p, 4p... Actually, the way the sieve of Erathostenes works, it can be shown that the first item that you will eliminate is p2, so you only need to check if p2, p2+p, p2+2p, p2+3p are there.

With my complete lack of knowledge about Visual Basic, this should take care of your main bottleneck:

Protected Function Eratosthenes(ByVal n As Integer) As String
    Dim maxValue As Integer = Math.Sqrt(n)
    Dim values As Generic.List(Of Integer) = New Generic.List(Of Integer)
    Dim i As Integer
    ''//create list.
    For i = 2 To n
        values.Add(i)
    Next

    For i = 2 To maxValue
        If values.Contains(i) Then
            Dim k As Integer
            For k = i * i To n Step i
                If values.Contains(k) Then
                    values.Remove(k)
                End If
            Next
        End If
    Next

    Dim result As String = ""
    For i = 0 To values.Count - 1
        result = result & " " & values(i)
    Next

    Return result
End Function

You may also want to consider, as has been posted by others, using an array of booleans to hold your sieve, and thus avoid all those Contains checks, which are probably very costly from a computational point of view.

Python is my language of choice, so even though it may not suit your needs, I have blogged about sieving for primes here, although you may also find this and this of interest...

Upvotes: 1

Jon Skeet
Jon Skeet

Reputation: 1502066

Removing elements from a large list is slow.

Why not create an array of Booleans values instead and set a value to "True" when you know that it's non-prime?

When you've found a new prime, you don't need to go through all higher values, just multiple of that one, setting the array element to True.

You can keep a separate list for primes you've found so far, if you want to return them.

Here's a C# implementation which just prints them out as it goes. (In C# if I wanted to return the values I'd return IEnumerable<T> and use an iterator block.)

using System;

public class ShowPrimes
{
    static void Main(string[] args)
    {
        ShowPrimes(10000000);
    }

    static void ShowPrimes(int max)
    {        
        bool[] composite = new bool[max+1];
        int maxFactor = (int) Math.Sqrt(max);

        for (int i=2; i <= maxFactor; i++)
        {
            if (composite[i])
            {
                continue;
            }
            Console.WriteLine("Found {0}", i);
            // This is probably as quick as only
            // multiplying by primes.
            for (int multiples = i * i;
                 multiples <= max;
                 multiples += i)
            {
                composite[multiples] = true;
            }
        }
        // Anything left is a prime, but not
        // worth sieving
        for (int i = maxFactor + 1; i <= max; i++)
        {
            if (composite[i])
            {
                continue;
            }
            Console.WriteLine("Found {0}", i);
        }
    }
}

Upvotes: 8

Joel Coehoorn
Joel Coehoorn

Reputation: 416039

If you're using .Net 3.5, you can re-write it like this:

Protected Function Eratosthenes(ByVal n As Integer) As String
    Dim values As List(Of Integer) = Enumerable.Range(0,n).ToList()

    For i = 2 To Math.Sqrt(n)
        If values(i) > 0 Then
            For k As Integer = i + 1 To n
                If values(k) AndAlso (k Mod i) = 0 Then
                    values(k) = 0
                End If
            Next
        End If
    Next

    Return string.Join(" ", values.Where(Function(i) i>1).Select(Function(i) i.ToString()).ToArray())
End Function

Upvotes: 1

John Nicholas
John Nicholas

Reputation: 4836

use a string builder.

also if its something that you really will use often then you probably want to pass an already known list of primes into it - that way you can reuse computation for existing result sets.

maybe convert the list of int to an array?

are you sure that contains terminates once the item is found?

maybe if you used an ordered prime list you can get a faster search algorithm to test for existing instead of a straight iteration from start to finish (ever back to front when you know its closer to end would be an advantage).

another method would be to multithread the loop so you can use multiple cores using a threadpool or a custom implementation to avoid starting and stopping threads. You would essentially be returning new primes into a pool that the function has a reference to.

Upvotes: 1

Amy B
Amy B

Reputation: 110151

Without changing the algorithm, here are some optimizations:

Dim result As String = ""
For i = 0 To values.Count - 1
    result = result & " " & values(i)
Next

This is Schlemiel the painter... Use string.Join instead.


values.Contains(i)
values.Contains(k)

These are expensive - Use HashSet instead of List to make it cheaper.


If values.Contains(k) Then
    If (k Mod i) = 0 Then
        values.Remove(k)
    End If
End If

Mod is way less expensive than Contains (even with HashSet). Do the Mod check first.

Upvotes: 1

Noon Silk
Noon Silk

Reputation: 55112

Creating an array of all the numbers up to the max number entered is madness; because by definition most of them won't be primes. So you're already holding at least 2x as much data as you need.

There are many well-known methods of looking for primes.

What are you trying to do? Print more than 10 million? How many more? What is the goal here?

If it's just to learn about primes, look into GCD and other fun things. (Chinese Remainder Theorem).

Upvotes: 0

Paul Keeble
Paul Keeble

Reputation: 1222

A big bottleneck in this code is the calling of values.Contains(i) and values.Contains(k) repeatedly on every integer candidate. One simple way to make your current approach faster would be to have a second list alongside the first which stores false/true against each number so that rather than doing a contains, an O(n) algorithm would allow a direct lookup which is O(1). The final loop in this case would then check for true as to whether the value should be included.

There are faster ways to solve the problem however. The traditional Eratosthenes algorithm is to generate numbers from 2 to sqrt(n) and test them against every prime known in a list of primes. If the number divides by any of them perfectly then it is thrown out and the next number is tried, else it is added as a prime in that list. This would likely go faster than the optimisation I suggested above.

Upvotes: 0

Dewfy
Dewfy

Reputation: 23634

It was proven that it is enough to checks prime only on interval 2..sqrt(maxValue)

Upvotes: 0

Juliet
Juliet

Reputation: 81526

Your bottle neck is the use of an List, plain and simple. List.contains is O(n), List.Remove(n).

You'll speed up your application by using a better data structure, namely a BitArray. Assume that items set to True are prime, and those which aren't composite. This means looking up, adding, or removing items from your prime set will be an O(1) operation.

Upvotes: 2

Daniel Elliott
Daniel Elliott

Reputation: 22867

Here is a C# implementation I threw together a while back.

Maybe it'll help!

using System;
using System.Collections.Generic;

namespace sieve050707
{
class MainClass
{
public static void Main(string[] args)
{
Console.WriteLine(”Sieve of Eratosthenes – 05 July 2007″);
int sampleSize = 100;

if (args.Length > 0)
{
sampleSize = Int32.Parse(args[0]);
}

List sampleData = new List();
List primesFound = new List();

for (int counter=2; counter < sampleSize; counter++)
{
sampleData.Add(counter);
}

for (int counter=2; counter < sampleSize; counter++)
{
if(sampleData.Contains(counter))
{
primesFound.Add(counter);
sampleData.Remove(counter);
for (int multiple=counter*2; multiple < sampleSize; multiple = multiple + counter)
{
sampleData.Remove(multiple);
}
}
}
foreach(int prime in primesFound)
{
Console.WriteLine(prime.ToString() + ” is prime”);
}
}
}
}

HTH,

Dan

Upvotes: 1

Khaled Alshaya
Khaled Alshaya

Reputation: 96879

I am not a VB guy, but I think:

values.Remove(k)

is a waste of time, just set the place to 0, then extract the prime numbers to another list or sort the same list and remove all zeros at once.

Upvotes: 4

Related Questions