ES2018
ES2018

Reputation: 438

Distinct on List<List<double>>

genesUsingCrossover is a List<List<double>>.

I am using the following line of code to count the distinct List<double> within List<List<double>>:

int count = genesUsingCrossover.Distinct().Count();

and I am not sure it is correct. The number of elements in genesUsingCrossover is 1250 and also genesUsingCrossover.Distinct().Count() is returning 1250, so I assumed that they are all distinct List. However, looking into the watch window I noticed that the third and fourth lists are the same.

So, I assume that the line of code is not correct. Is there a way to improve it? and to count the number of distinct element?

enter image description here

Upvotes: 4

Views: 501

Answers (3)

Antoine V
Antoine V

Reputation: 7204

In fact, you don't define which criteria two lists are considered equal. That means .NET check if two lists have the same references in the memory by default because List is reference type

Obliviously, each list has his memory. So your list has 1205 elements, it returns 1205 element distinct.

As your description, I think your criteria is : 2 lists containing the same elements should be equal.

The Distinct can receive a IEqualityComparer, so the idea is : implementation IEqualityComparer for List<double>

class NumberDoubles: IEqualityComparer<List<double>>
{
    public bool Equals(List<double> x, List<double> y)
    {
        //Check whether the compared objects reference the same data.
        if (Object.ReferenceEquals(x, y)) return true;

        //Check whether any of the compared objects is null.
        if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
            return false;

        if (x.Count!= y.Count)
            return false;

        //Check whether the arrays' values are equal.
        for(int i = 0; i < x.Count; i++){
            if(x[i] != y[i])
                return false;
        }

        // If got this far, arrays are equal
        return true;
    }

    // If Equals() returns true for a pair of objects 
    // then GetHashCode() must return the same value for these objects.

    public int GetHashCode(List<double> doubleArray)
    {
        //Check whether the object is null
        if (Object.ReferenceEquals(doubleArray, null)) return 0;

        //Calculate the hash code for the array
        int hashCode = 0;
        bool isFirst = true;
        foreach(int i in doubleArray){
            if(isFirst) {
                hashCode = i;
                isFirst = false;
            }
            else
            {
                hashCode = hashCode ^ i;
            }
        }
        return hashCode;
    }
}

and your code:

genesUsingCrossover.Distinct(new NumberDoubles());

Upvotes: 1

kuskmen
kuskmen

Reputation: 3775

Like @Joe said, you should be really thinking "Do I really need List<List<double>> to me sounds like an inappropriate structure for the job. Distinction can come out of the box with HashSet (well not entirely in your case, but in general everytime someone sees HashSet hopefully it rings a bell that uniqueness is desired, while List<List<double>> doesn't necessarily shows that).

That being said I suggest you the following solution with HashSet<List<double>>

using System;
using System.Collections.Generic;
using System.Linq;

public static class Program {
    public static void Main() {
        var hashSet = new HashSet<List<double>>(new ListComparer());

        hashSet.Add(new List<double> { 1.2d, 1.5d });
        hashSet.Add(new List<double> { 1.2d, 1.5d });

        Console.Write(hashSet.Count);
    }

    public class ListComparer : IEqualityComparer<List<double>> 

    {
        public bool Equals(List<double> x, List<double> y)
        {
            // your logic for equality
            return true;
        }

        public int GetHashCode(List<double> obj)
        {
           int hash = 0;
           unchecked {
               foreach(var d in obj) {
                   hash += d.GetHashCode();
               }
           }
           return hash;
        }  
    }
}

Keep in mind that Equals method will be called a lot of times, so some performance considerations might be good to be had in mind.

Upvotes: 0

Joe Farrell
Joe Farrell

Reputation: 3542

List<T> doesn't override Object.Equals, so two List<double> objects will only be considered equal if they're reference-equal. (Here is the implementation of Distinct<T>() if you want to see how it works.) It sounds like you want to consider two lists to be equal if the elements that compose them are equal. For this you can use an overload of Distinct<T>() that takes an IEqualityComparer<T>, which it will use to determine whether two lists are equal. So in your case you could provide an implementation of IEqualityComparer<List<double>> that expresses your idea of list-equality.

What that implementation will look like depends on exactly when you want to consider two lists to be equal. For instance, must they have the same set of elements in the same order, or is order not relevant? There are other questions on Stack Overflow that explain how to implement both. In either case, keep in mind that Distinct() is going to invoke your implementation many times, so it's important that your algorithm perform well. To that end, it might be worth asking whether List<List<double>> is really the data structure that you want or whether some other choice might better suit.

Upvotes: 3

Related Questions