Paul
Paul

Reputation: 3954

Reducing duplicates in a dictionary using LINQ in C#

I'm trying to use LINQ to create a new dictionary from an existing one, removing duplicates in the process.

The existing dictionary is as follows:

Dictionary<LoaderConfig, List<ColumnInfo>> InvalidColumns = new Dictionary<LoaderConfig, List<ColumnInfo>>();

public struct LoaderConfig
{
    public string ObjectName { get; set; }
    public DateTime? LoadDate { get; set; }
    public string Load { get; set; }
    public string TableName { get; set; }
}

public struct ColumnInfo
{
    public string ColumnName { get; set; }
    public string DataType { get; set; }
    public int DataLength { get; set; }
}

What I want to end up with is a Dictionary<string, List<ColumnInfo>> where the key is the TableName attribute of the LoaderConfig objects and the list of ColumnInfo objects are unique for each TableName.

I started with this based on another post that I found:

var alterations = InvalidColumns
    .GroupBy(pair => pair.Key.TableName)
    .Select(group => group.First())
    .ToDictionary(pair => pair.Key.TableName, pair => pair.Value);

Which doesn't work because of the First(). I imagine there is a way to acheive this using LINQ extensions, I just need some help finding it.

Thanks!

Upvotes: 2

Views: 1116

Answers (3)

Henk Holterman
Henk Holterman

Reputation: 273179

  //untested
  var alterations = InvalidColumns
             .GroupBy(pair => pair.Key.TableName)
             .ToDictionary(group => group.Key, 
                           group => group.SelectMany(g => g.Value).Distinct());

An you have to work in a .Distinct() somehow.

Edit A Distinct() was indeed needed, added.

Upvotes: 1

Olivier Jacot-Descombes
Olivier Jacot-Descombes

Reputation: 112324

Dictionary<string, List<ColumnInfo>> alterations = InvalidColumns
    .SelectMany(p => p.Value, (p, col) => new { p.Key.TableName, col })
    .GroupBy(single => single.TableName, single => single.col)
    .ToDictionary(g => g.Key, g => g.Distinct().ToList());

SelectMany flattens the column lists, i.e it creates an enumeration of table names and single columns. This enumeration is then regrouped by table name.

Upvotes: 1

Arnaud F.
Arnaud F.

Reputation: 8452

Personaly I'll do something like this :

First create a IEqualityComparer for ColumnInfo (used by Distinct)

    public struct ColumnInfo
    {
        public string ColumnName { get; set; }
        public string DataType { get; set; }
        public int DataLength { get; set; }

        public class ColumnNameComparer : IEqualityComparer<ColumnInfo>
        {
            public bool Equals(ColumnInfo x, ColumnInfo y)
            {
                return x.ColumnName == y.ColumnName;
            }

            public int GetHashCode(ColumnInfo obj)
            {
                return obj.ColumnName.GetHashCode();
            }
        }
    }

Then the query :

        var colComparer = new ColumnInfo.ColumnNameComparer();
        Dictionary<string, List<ColumnInfo>> res = InvalidColumns
            .GroupBy(i => i.Key.TableName)
            .ToDictionary(i => i.Key, i => i.SelectMany(j => j.Value.Distinct(colComparer)).ToList());

Upvotes: 3

Related Questions