Johan
Johan

Reputation: 35194

Increasing performance when comparing two lists

What options do I have when it comes to comparing items in two lists? I'm having some performance issues, and I would like to know if there are any faster alternatives:

int[] foo = { 1, 2, 3, 4, 5 };
int[] bar = { 6, 7, 8, 9, 1 };

var result = foo.Any(x => bar.Contains(x));

Regardless if I use the lambda methods or use a foreach on my own, I assume that the performance loss will still be O(N^2). Can I do anything to affect that?

Upvotes: 3

Views: 93

Answers (3)

Sergey Berezovskiy
Sergey Berezovskiy

Reputation: 236208

You can use Enumerable.Intersect:

var result = foo.Intersect(bar).Any();

That creates Set<T> from bar items and then enumerates foo until first match found. Internally that looks like:

Set<int> set = new Set<int>();

foreach (int local in bar) // M times
    set.Add(local); // O(1)

foreach (int value in foo) // N times max
{
    if (!set.Remove(value)) // O(1)
        continue;

    yield return value;
}

As Patryk Ćwiek correctly pointed, that gives you O(N+M) instead of O(N*M)

Upvotes: 6

Matthew Watson
Matthew Watson

Reputation: 109557

For completeness, here's a benchmark program to test the various answers in this thread.

It seems to show that the HashSet approach is marginally fastest:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace Demo
{
    internal class Program
    {
        private void run()
        {
            var foo = Enumerable.Range(     0, 100000).ToArray();
            var bar = Enumerable.Range(100000, 100000).ToArray();

            int trials = 4;

            Stopwatch sw = new Stopwatch();

            for (int i = 0; i < trials; ++i)
            {
                sw.Restart();
                method1(foo, bar);
                Console.WriteLine("method1()     took " +sw.Elapsed);

                sw.Restart();

                for (int j = 0; j < 100; ++j)
                    method2(foo, bar);

                Console.WriteLine("method2()*100 took " +sw.Elapsed);

                sw.Restart();

                for (int j = 0; j < 100; ++j)
                    method3(foo, bar);

                Console.WriteLine("method3()*100 took " +sw.Elapsed);

                Console.WriteLine();
            }
        }

        private static bool method1(int[] foo, int[] bar)
        {
            return foo.Any(bar.Contains);
        }

        private static bool method2(int[] foo, int[] bar)
        {
            var hashSet = new HashSet<int>(bar);
            return foo.Any(hashSet.Contains);
        }

        private static bool method3(int[] foo, int[] bar)
        {
            return foo.Intersect(bar).Any();
        }

        private static void Main()
        {
            new Program().run();
        }
    }
}

The results (release build) on my PC are as follows. Note that I ran method2() and method3() 100 times each because they are so much faster than method1():

method1()     took 00:00:12.2781951
method2()*100 took 00:00:00.4920760
method3()*100 took 00:00:00.7045298

method1()     took 00:00:11.9267980
method2()*100 took 00:00:00.4688330
method3()*100 took 00:00:00.6886865

method1()     took 00:00:11.8959856
method2()*100 took 00:00:00.4736563
method3()*100 took 00:00:00.6875508

method1()     took 00:00:11.9083229
method2()*100 took 00:00:00.4572404
method3()*100 took 00:00:00.6838919

Upvotes: 2

Selman Gen&#231;
Selman Gen&#231;

Reputation: 101681

You can use a Hashset:

int[] foo = { 1, 2, 3, 4, 5 };
int[] bar = { 6, 7, 8, 9, 1 };
var hashSet = new Hashset<int>(bar);
var result = foo.Any(x => hashSet.Contains(x));

Or you can use Except with Any like this:

var result =  !foo.Except(bar).Any();

I bet that is race with the Sergey's solution :p

Upvotes: 2

Related Questions