Reputation: 13892
I'm intersecting a set of 100,000 numbers and a set of 1,000 numbers using set_intersection in STL and its taking 21s, where it takes 11ms in C#.
C++ Code:
int runIntersectionTestAlgo()
{
set<int> set1;
set<int> set2;
set<int> intersection;
// Create 100,000 values for set1
for ( int i = 0; i < 100000; i++ )
{
int value = 1000000000 + i;
set1.insert(value);
}
// Create 1,000 values for set2
for ( int i = 0; i < 1000; i++ )
{
int random = rand() % 200000 + 1;
random *= 10;
int value = 1000000000 + random;
set2.insert(value);
}
set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), inserter(intersection, intersection.end()));
return intersection.size();
}
C# Code:
static int runIntersectionTest()
{
Random random = new Random(DateTime.Now.Millisecond);
Dictionary<int,int> theMap = new Dictionary<int,int>();
List<int> set1 = new List<int>();
List<int> set2 = new List<int>();
// Create 100,000 values for set1
for ( int i = 0; i < 100000; i++ )
{
int value = 1000000000 + i;
set1.Add(value);
}
// Create 1,000 values for set2
for ( int i = 0; i < 1000; i++ )
{
int value = 1000000000 + (random.Next() % 200000 + 1);
set2.Add(value);
}
// Now intersect the two sets by populating the map
foreach( int value in set1 )
{
theMap[value] = 1;
}
int intersectionSize = 0;
foreach ( int value in set2 )
{
int count;
if ( theMap.TryGetValue(value, out count ) )
{
intersectionSize++;
theMap[value] = 2;
}
}
return intersectionSize;
}
}
Upvotes: 1
Views: 3083
Reputation: 25834
Your C# and C++ code work differently. The C# code uses magical hashing tricks for speed, your C++ code uses tree tricks for speed. One thing that will probably speed things up (ignoring the fact that your testing seems to be broken) would be to using hashing, as follows:
hash_map
of one of the two collections.Upvotes: 0
Reputation: 5733
I ran your C++ code on my linux box
$ time ./test
real 0m0.073s
user 0m0.060s
sys 0m0.003s
21s means to me you compiled without optimizations. In case you use MSVC make sure you have listed
_SECURE_SCL=0
(see msdn) in the compile definitions. Otherwise all STL iterator operations are dog slow.
Upvotes: 5
Reputation: 247999
On this ancient 3GHz Pentium 4, I get 2734 milliseconds for the entire runIntersectionTestAlgo
function, in a debug build with optimizations disabled. I compiled with VS2008 SP1.
If I enable optimizations, I get 93 milliseconds.
Here's my code:
#include <set>
#include <algorithm>
using namespace std;
int runIntersectionTestAlgo()
{
set<int> set1;
set<int> set2;
set<int> intersection;
// Create 100,000 values for set1
for ( int i = 0; i < 100000; i++ )
{
int value = 1000000000 + i;
set1.insert(value);
}
// Create 1,000 values for set2
for ( int i = 0; i < 1000; i++ )
{
int random = rand() % 200000 + 1;
random *= 10;
int value = 1000000000 + random;
set2.insert(value);
}
set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), inserter(intersection, intersection.end()));
return intersection.size();
}
#include <windows.h>
#include <iostream>
int main(){
DWORD start = GetTickCount();
runIntersectionTestAlgo();
DWORD span = GetTickCount() - start;
std::cout << span << " milliseconds\n";
}
Disabling _SECURE_SCL
made no difference for the release build, which still hovered around the 100 ms.
GetTickCount
isn't ideal, of course, but it should be good enough to distinguish 21 seconds from less than 100 milliseconds.
So I conclude that there's something wrong with your benchmarks.
Upvotes: 2
Reputation: 21721
I updated your example to use some timer code that I use when unit testing. On my machine I get the following timings (based on -O3):
First loop 0.0040654
Second loop 4.8e-05
Intersection 0.000349
Intersection size: 50
Based on that, if I'm reading my decimals correctly, it takes '4ms' to insert the items into the first set, 50 microseconds to insert the items into the second set and a 1/3 of a ms to perform the intersection.
I am unable to run your C# example on my machine, so I cannot compare the timing, but it's definitely not 21s as you post.
Upvotes: 1
Reputation: 4735
A couple things would make your two examples more comparable.
First, your example in STL isn't quite right, for one thing both sets should be sorted in ascending order (in STL speak a "strict weak ordering").
Second, your using "sets" which are implemented as trees in STL, vs. "lists" which are linked lists. Random inserts into a set is more expensive than inserting onto the end of a list.
Try using a list of ints in the C++ example and also sort the list first (otherwise set inersection won't work properly) and I think you'll see more favorable results.
Upvotes: 8