superstar
superstar

Reputation: 1832

Function timed out for large list (LINQ query in C#)

I am using the following query

var queryList1Only = (from file in list1
                                  select file).Except(list2, myFileCompare);

while myFileCompare does a comparison of 2 files based on the name and length.

The query was returning the results if the list1 and list2 were small (say 100 files while I tested), then I increased the list1 to 30,000 files and list2 to 20,000 files and the query now says "Function Evaluation Timed Out".

I searched online and found debugging could cause it, so I removed all the breakpoints and ran the code, now the program just froze, without any output for queryList1Only I am trying to print out to check it.

EDIT: This is the code for myFileCompare

class FileCompare : System.Collections.Generic.IEqualityComparer<System.IO.FileInfo>
    {

        public FileCompare() { }

        public bool Equals(System.IO.FileInfo f1, System.IO.FileInfo f2)
        {
            return (f1.Name == f2.Name && f1.Directory.Name == f2.Directory.Name && 
                    f1.Length == f2.Length);
        }

        // Return a hash that reflects the comparison criteria. According to the 
        // rules for IEqualityComparer<T>, if Equals is true, then the hash codes must
        // also be equal. Because equality as defined here is a simple value equality, not
        // reference identity, it is possible that two or more objects will produce the same
        // hash code.
        public int GetHashCode(System.IO.FileInfo fi)
        {
            string s = String.Format("{0}{1}", fi.Name, fi.Length);
            return s.GetHashCode();
        }

    }

Upvotes: 8

Views: 4034

Answers (3)

sll
sll

Reputation: 62544

What are you need to do with the items returned by a query? Basically such heavy operations would be great to execute simultaneously in a separate thread to avoid the situations you've just faced.

EDIT: An idea

As a case you can try following algorithm:

  • Sort items in both arrays using QuickSort (List<T>.Sort() uses it by default), it will be pretty fast with good implementation of GetHashCode()
  • Then in well known for() loop traverse list and compare elements with the same index
  • When count of any array reaches maximum index of an other list - select all items from latter list as different (basically they are not exists in former list at all).

I believe with sorted arrays you'll give much better performance. I believe complexity of Except() is O(m*n).

EDIT: An other idea, should be really fast

  • From one server store items in Set<T>
  • Then loop through second array and search within a Set<T>, it would be VERY fast! Basically O(mlogm) + O(n) because you need to traverse only single array and search within a set with good hash function (use GetHashCode() I've provided with an updated logic) is very quick. Try it out!
// some kind of C# pseudocode ;)
public IEnumerable<FileInfo> GetDifference()
{           
    ISet<FileInfo> firstServerFilesMap = new HashSet<FileInfo>();

    // adding items to set
    firstServerFilesMap.Add();

    List<FileInfo> secondServerFiles = new List<FileInfo>();

    // adding items to list
    firstServerFilesMap.Add();

    foreach (var secondServerFile in secondServerFiles)
    {
        if (!firstServerFilesMap.Contains(secondServerFile))
        {
            yield return secondServerFile;
        }
    }
}

EDIT: More details regarding equality logic were provided in comments

Try out this impelmentation

public bool Equals(System.IO.FileInfo f1, System.IO.FileInfo f2)
{
      if ( f1 == null || f2 == null)
      {
          return false;
      }

      return (f1.Name == f2.Name && f1.Directory.Name == f2.Directory.Name && 
             f1.Length == f2.Length);
}

public int GetHashCode(System.IO.FileInfo fi)
{
    unchecked
    {
        int hash = 17;    
        hash = hash * 23 + fi.Name.GetHashCode();
        hash = hash * 23 + fi.Directory.Name.GetHashCode();
        hash = hash * 23 + fi.Length.GetHashCode();

        return hash;
    }
}

Useful links:

Upvotes: 3

Francisco
Francisco

Reputation: 4101

I haven't tried this myself, but here is an idea: Implement list1 as HashSet, this way:

HashSet<FileInfo> List1 = new HashSet<FileInfo>(myFileCompare);

Add all files:

foreach(var file in files)
{
    List1.Add(file);
}

Then remove elements:

List1.ExceptWith(list2);

Then enumerate:

foreach(var file in List1)
{
    //do something
}

I think it's faster, but as I said, I haven't tried it. Here is a link with general information on HashSet.

Edit: Or better yet, you can initialize and add data in one step:

HashSet<FileInfo> List1 = new HashSet<FileInfo>(files, myFileCompare);

Upvotes: 1

McKay
McKay

Reputation: 12614

I'd recommend removing the length from the hash code, and just doing fi.FullName. That still holds the uniqueness guideline, though there may (under some cases, where you think length is needed to distinguish) be hash collisions. But that is probably preferable to a longer "Except" execution. Similarly, change your equality comparison from being name and directory, to fullname, that would probably be more performant as well.

Upvotes: -1

Related Questions