Reputation: 22257
I've written an algorithm that I believe to be correct for computing prime numbers up to n with the Sieve of Eratosthenes. Unfortunately, this program hangs on really large values of n (try 10 million). Here is what I've written...
Protected Function Eratosthenes(ByVal n As Integer) As String
Dim maxValue As Integer = Math.Sqrt(n)
Dim values As Generic.List(Of Integer) = New Generic.List(Of Integer)
Dim i As Integer
''//create list.
For i = 2 To n
values.Add(i)
Next
For i = 2 To maxValue
If values.Contains(i) Then
Dim k As Integer
For k = i + 1 To n
If values.Contains(k) Then
If (k Mod i) = 0 Then
values.Remove(k)
End If
End If
Next
End If
Next
Dim result As String = ""
For i = 0 To values.Count - 1
result = result & " " & values(i)
Next
Return result
End Function
How might I speed this algorithm up? Where are my bottlenecks?
Upvotes: 4
Views: 2283
Reputation: 15813
This will do n=10,000,000 in .28 seconds, and n=100,000,000 in about 4 seconds. (VB 2008, Pentium e8500)
The slowest part was the string concatenation. VB is really slow concatenating large strings, so I used an integer array to save the primes.
Instead of doing a mod for each value, you can multiply and you only have to consider a fraction of the values.
VB did not optimize int(i/2) in the "for k" statement, so I moved it to the variable max2.
For large n, redimensioning the answer array became a bottleneck, so I added 100,000 elements each time it needed expanding.
As usual... I didn't check this thoroughly, so there may be bugs.
Protected Function Eratosthenes(ByVal n As Integer) As Integer()
Dim nPrime As Integer
Dim values(n) As Byte
Dim ans(1) As Integer
Dim maxValue As Integer = Math.Sqrt(n)
For i = 0 To n
values(i) = 1
Next i
Dim max2 As Integer
For i = 2 To maxValue
If values(i) <> 0 Then
max2 = Int(n / i)
For k = 2 To max2
values(k * i) = 0
Next k
End If
Next i
nPrime = 0
For i = 2 To UBound(values)
If values(i) <> 0 Then
nPrime = nPrime + 1
If nPrime > UBound(ans) Then ReDim Preserve ans(nPrime + 100000)
ans(nPrime) = i
End If
Next i
ReDim Preserve ans(nPrime)
Eratosthenes = ans
End Function
Upvotes: 0
Reputation: 67457
I think you are missing the key point of what makes sieving a great algorithmic tool...
The greatness of sieving is that it allows you to avoid doing the costly modulo operation: if you know p is a prime, to eliminate all of its multiples, rather than checking if all numbers in the sieve are divisible by p, you eliminate items 2p, 3p, 4p... Actually, the way the sieve of Erathostenes works, it can be shown that the first item that you will eliminate is p2, so you only need to check if p2, p2+p, p2+2p, p2+3p are there.
With my complete lack of knowledge about Visual Basic, this should take care of your main bottleneck:
Protected Function Eratosthenes(ByVal n As Integer) As String
Dim maxValue As Integer = Math.Sqrt(n)
Dim values As Generic.List(Of Integer) = New Generic.List(Of Integer)
Dim i As Integer
''//create list.
For i = 2 To n
values.Add(i)
Next
For i = 2 To maxValue
If values.Contains(i) Then
Dim k As Integer
For k = i * i To n Step i
If values.Contains(k) Then
values.Remove(k)
End If
Next
End If
Next
Dim result As String = ""
For i = 0 To values.Count - 1
result = result & " " & values(i)
Next
Return result
End Function
You may also want to consider, as has been posted by others, using an array of booleans to hold your sieve, and thus avoid all those Contains checks, which are probably very costly from a computational point of view.
Python is my language of choice, so even though it may not suit your needs, I have blogged about sieving for primes here, although you may also find this and this of interest...
Upvotes: 1
Reputation: 1502066
Removing elements from a large list is slow.
Why not create an array of Booleans values instead and set a value to "True" when you know that it's non-prime?
When you've found a new prime, you don't need to go through all higher values, just multiple of that one, setting the array element to True.
You can keep a separate list for primes you've found so far, if you want to return them.
Here's a C# implementation which just prints them out as it goes. (In C# if I wanted to return the values I'd return IEnumerable<T>
and use an iterator block.)
using System;
public class ShowPrimes
{
static void Main(string[] args)
{
ShowPrimes(10000000);
}
static void ShowPrimes(int max)
{
bool[] composite = new bool[max+1];
int maxFactor = (int) Math.Sqrt(max);
for (int i=2; i <= maxFactor; i++)
{
if (composite[i])
{
continue;
}
Console.WriteLine("Found {0}", i);
// This is probably as quick as only
// multiplying by primes.
for (int multiples = i * i;
multiples <= max;
multiples += i)
{
composite[multiples] = true;
}
}
// Anything left is a prime, but not
// worth sieving
for (int i = maxFactor + 1; i <= max; i++)
{
if (composite[i])
{
continue;
}
Console.WriteLine("Found {0}", i);
}
}
}
Upvotes: 8
Reputation: 416039
If you're using .Net 3.5, you can re-write it like this:
Protected Function Eratosthenes(ByVal n As Integer) As String
Dim values As List(Of Integer) = Enumerable.Range(0,n).ToList()
For i = 2 To Math.Sqrt(n)
If values(i) > 0 Then
For k As Integer = i + 1 To n
If values(k) AndAlso (k Mod i) = 0 Then
values(k) = 0
End If
Next
End If
Next
Return string.Join(" ", values.Where(Function(i) i>1).Select(Function(i) i.ToString()).ToArray())
End Function
Upvotes: 1
Reputation: 4836
use a string builder.
also if its something that you really will use often then you probably want to pass an already known list of primes into it - that way you can reuse computation for existing result sets.
maybe convert the list of int to an array?
are you sure that contains terminates once the item is found?
maybe if you used an ordered prime list you can get a faster search algorithm to test for existing instead of a straight iteration from start to finish (ever back to front when you know its closer to end would be an advantage).
another method would be to multithread the loop so you can use multiple cores using a threadpool or a custom implementation to avoid starting and stopping threads. You would essentially be returning new primes into a pool that the function has a reference to.
Upvotes: 1
Reputation: 110151
Without changing the algorithm, here are some optimizations:
Dim result As String = ""
For i = 0 To values.Count - 1
result = result & " " & values(i)
Next
This is Schlemiel the painter... Use string.Join instead.
values.Contains(i)
values.Contains(k)
These are expensive - Use HashSet instead of List to make it cheaper.
If values.Contains(k) Then
If (k Mod i) = 0 Then
values.Remove(k)
End If
End If
Mod is way less expensive than Contains (even with HashSet). Do the Mod check first.
Upvotes: 1
Reputation: 55112
Creating an array of all the numbers up to the max number entered is madness; because by definition most of them won't be primes. So you're already holding at least 2x as much data as you need.
There are many well-known methods of looking for primes.
What are you trying to do? Print more than 10 million? How many more? What is the goal here?
If it's just to learn about primes, look into GCD and other fun things. (Chinese Remainder Theorem).
Upvotes: 0
Reputation: 1222
A big bottleneck in this code is the calling of values.Contains(i) and values.Contains(k) repeatedly on every integer candidate. One simple way to make your current approach faster would be to have a second list alongside the first which stores false/true against each number so that rather than doing a contains, an O(n) algorithm would allow a direct lookup which is O(1). The final loop in this case would then check for true as to whether the value should be included.
There are faster ways to solve the problem however. The traditional Eratosthenes algorithm is to generate numbers from 2 to sqrt(n) and test them against every prime known in a list of primes. If the number divides by any of them perfectly then it is thrown out and the next number is tried, else it is added as a prime in that list. This would likely go faster than the optimisation I suggested above.
Upvotes: 0
Reputation: 23634
It was proven that it is enough to checks prime only on interval 2..sqrt(maxValue)
Upvotes: 0
Reputation: 81526
Your bottle neck is the use of an List, plain and simple. List.contains is O(n), List.Remove(n).
You'll speed up your application by using a better data structure, namely a BitArray. Assume that items set to True are prime, and those which aren't composite. This means looking up, adding, or removing items from your prime set will be an O(1) operation.
Upvotes: 2
Reputation: 22867
Here is a C# implementation I threw together a while back.
Maybe it'll help!
using System;
using System.Collections.Generic;
namespace sieve050707
{
class MainClass
{
public static void Main(string[] args)
{
Console.WriteLine(”Sieve of Eratosthenes – 05 July 2007″);
int sampleSize = 100;
if (args.Length > 0)
{
sampleSize = Int32.Parse(args[0]);
}
List sampleData = new List();
List primesFound = new List();
for (int counter=2; counter < sampleSize; counter++)
{
sampleData.Add(counter);
}
for (int counter=2; counter < sampleSize; counter++)
{
if(sampleData.Contains(counter))
{
primesFound.Add(counter);
sampleData.Remove(counter);
for (int multiple=counter*2; multiple < sampleSize; multiple = multiple + counter)
{
sampleData.Remove(multiple);
}
}
}
foreach(int prime in primesFound)
{
Console.WriteLine(prime.ToString() + ” is prime”);
}
}
}
}
HTH,
Dan
Upvotes: 1
Reputation: 96879
I am not a VB guy, but I think:
values.Remove(k)
is a waste of time, just set the place to 0, then extract the prime numbers to another list or sort the same list and remove all zeros at once.
Upvotes: 4