Christopher
Christopher

Reputation: 788

LINQ sorting on objects on subobject

I have a list of objects (Books) and those objects look like this:

Class Book
{
    public string book_Name {get; set;}
    public Dictionary<string,string> book_Dictionary {get; set;}

    public Book(string book_Name, Dictionary<string,string> book_Dictionary)
    {
        this.book_Name = book.Name;
        this.book_Dictionary = book_Dictionary;
    }
}

I compile these Book objects into a list of books, so I have

  List<Book> library;

I want to go through this list and sort out any duplicate book objects. (A duplicate book would be a book that has the same name and dictionary as any other book in the list).

To do this, I am trying the following:

    private List<Book> removeDuplicateBooks(List<Book> library)
    {
        List<Book> distinctLibrary = library
           .GroupBy(x => new { x.book_Name, x.book_Dictionary })
           .Select(g => g.First())
           .ToList();

        return distinctLibrary;
    }

However, it looks like this does not remove duplicates... my guess is that the groupBy is somehow getting thrown for a loop because one of the dictionary?

EDIT: To remove ambiguity -- When I say it does not remove duplicates, I mean the distinctLibrary that is returned is the same as the library (even though the library does contain duplicate books).

EDIT: Examples: Lets say my library contains the following Book objects:

bookNum1:

name: "book1",

dictionary: {
   Key:"foo"
   Value:"bar"

   Key:"balloon"
   Value:"red"
}

bookNum2:

name: "book2",

dictionary: {
   Key:"foo"
   Value:"bar"

   Key:"balloon"
   Value:"red"
}

bookNum3:

name: "book1",

dictionary: {
   Key:"foo"
   Value:"bar"

   Key:"balloon"
   Value:"red"
}

bookNum4:

name: "book1",

dictionary: {
   Key:"fooey"
   Value:"bar"

   Key:"balloon"
   Value:"red"
}

If I put this library through the removeDuplicates() function, I would expect a library to return containing the following book objects: bookNum1, bookNum2, bookNum4

Upvotes: 0

Views: 110

Answers (4)

user9472013
user9472013

Reputation:

Since you already have a method for removing duplicate Books, i suggest this:

private void removeDuplicateBooks(List<Book> library)
    {
        foreach(Book b in library) {
            // put book b in a new library
            List<Book> distinctLibrary = library.FindAll(b);
            // if you find more than once the book
            if(distinctLibrary.Count > 1) {
                 // delete all copies of b and keep only one
                 library.RemoveAll(b);
                 library.Add(b);
            }
        }
    }

Note: this method doesn't return a new library, only cleans the deafault library.

Upvotes: 0

Tim Schmelter
Tim Schmelter

Reputation: 460098

Dictionary<TKey,TValue> doesn't override Equals or GetHashCode, that's why GroupBy will just compare references. All those dictionaries are created with new Dictionary... , so they are different references. You need to override Equals and GetHashCode in Book and/or implement IEquatable<Book> and/or provide a custom IEqualityComparer<Book>(for GroupBy).

public class Book : IEquatable<Book>
{
    public string BookName { get; }
    public Dictionary<string, string> BookDictionary { get; }

    public Book(string bookName, Dictionary<string, string> bookDictionary)
    {
        this.BookName = bookName;
        this.BookDictionary = bookDictionary ?? throw new ArgumentNullException(nameof(bookDictionary));
    }

    public bool Equals(Book other)
    {
        if (ReferenceEquals(null, other))
        {
            return false;
        }

        if (ReferenceEquals(this, other))
        {
            return true;
        }

        if (!string.Equals(BookName, other.BookName))
            return false;
        if (BookDictionary.Count != other.BookDictionary.Count)
            return false;
        return BookDictionary.All(kv => other.BookDictionary.ContainsKey(kv.Value) 
                                     && other.BookDictionary[kv.Key] == kv.Value);
    }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj))
        {
            return false;
        }

        if (ReferenceEquals(this, obj))
        {
            return true;
        }

        if (obj.GetType() != this.GetType())
        {
            return false;
        }

        return Equals((Book) obj);
    }

    public override int GetHashCode()
    {
        unchecked
        {
            int dictHash = 17;
            foreach (KeyValuePair<string, string> kv in this.BookDictionary)
            {
                dictHash = dictHash * 23 + kv.Key.GetHashCode();
                dictHash = dictHash * 23 + (kv.Value ?? "").GetHashCode();
            }

            return ((BookName != null ? BookName.GetHashCode() : 0) * 397) ^ dictHash;
        }
    }
}

Now you can use the Book itself as key in dictionaries or as argument for GroupBy:

List<Book> distinctLibrary = library
   .GroupBy(book => book)
   .Select(g => g.First())
   .ToList();

and now you could even use the more efficient (and simpler):

List<Book> distinctLibrary = library.Distinct().ToList();

Upvotes: 3

arslanaybars
arslanaybars

Reputation: 1853

@TimSchmelter answer looks quite nice but I also would like to share my approach (I think and do it simple :) ) Also I can get the critics on the way especially from @Tim Schmelter.

Main method and dummy data that you shared.

    static void Main(string[] args)
    {
        Dictionary<string, string> dict = new Dictionary<string, string>()
        {
            { "foo", "bar"},
            { "balloon", "red"}
        };

        Dictionary<string, string> dict2 = new Dictionary<string, string>()
        {
            { "fooey", "bar"},
            { "balloon", "red"}
        };

        var books = new List<Book>();
        books.Add(new Book("book1", dict));
        books.Add(new Book("book2", dict));
        books.Add(new Book("book1", dict));
        books.Add(new Book("book1", dict2));

        var distinctLib = RemoveDuplicateBooks(books);
    }

Linq query;

    private static List<Book> RemoveDuplicateBooks(List<Book> library)
    {
        var distinctLib = from c in library
                        group c by new
                        { 
                            c.book_Name,
                            c.book_Dictionary
                        } into temp
                        select new Book()
                        {
                            book_Name = temp.First().book_Name,
                            book_Dictionary = temp.First().book_Dictionary
                        };

        return distinctLib.ToList();

    }

Return;

Book1, Book2 and Book4

Upvotes: 1

Toni Kostelac
Toni Kostelac

Reputation: 361

When you want to compare a custom type you have to implement the IComparable interface. This then allows for determining if an object is "greater than" or "less than" or equal.

How you implement this depends on your business logic.

Upvotes: 1

Related Questions