schellack
schellack

Reputation: 10274

How to add to a dictionary when there may be duplicate keys?

A bunch of key/value pairs, from an object that may have duplicate keys, need to be added to a dictionary. Only the first distinct instance of a key (and the instance's value) should be added to the dictionary.

Below is an example implementation that appears, at first, to work fine.

void Main()
{
    Dictionary<long, DateTime> items = new Dictionary<long, DateTime>();
    items = AllItems.Select(item =>
                    {
                        long value;
                        bool parseSuccess = long.TryParse(item.Key, out value);
                        return new { value = value, parseSuccess, item.Value };
                    })
                    .Where(parsed => parsed.parseSuccess && !items.ContainsKey(parsed.value))
                    .Select(parsed => new { parsed.value, parsed.Value })
                    .Distinct()
                    .ToDictionary(e => e.value, e => e.Value);
    Console.WriteLine(string.Format("Distinct: {0}{1}Non-distinct: {2}",items.Count, Environment.NewLine, AllItems.Count));

}

public List<KeyValuePair<string, DateTime>> AllItems
{
    get
    {
        List<KeyValuePair<string, DateTime>> toReturn = new List<KeyValuePair<string, DateTime>>();
        for (int i = 1000; i < 1100; i++)
        {
            toReturn.Add(new KeyValuePair<string, DateTime>(i.ToString(), DateTime.Now));
            toReturn.Add(new KeyValuePair<string, DateTime>(i.ToString(), DateTime.Now));
        }
        return toReturn;
    }
}

If AllItems is modified to return many more pairs, however, then an ArgumentException occurs: "An item with the same key has already been added."

void Main()
{
    Dictionary<long, DateTime> items = new Dictionary<long, DateTime>();
    var AllItems = PartOne.Union(PartTwo);
    Console.WriteLine("Total items: " + AllItems.Count());
    items = AllItems.Select(item =>
                    {
                        long value;
                        bool parseSuccess = long.TryParse(item.Key, out value);
                        return new { value = value, parseSuccess, item.Value };
                    })
                    .Where(parsed => parsed.parseSuccess && !items.ContainsKey(parsed.value))
                    .Select(parsed => new { parsed.value, parsed.Value })
                    .Distinct()
                    .ToDictionary(e => e.value, e => e.Value);
    Console.WriteLine("Distinct: {0}{1}Non-distinct: {2}",items.Count, Environment.NewLine, AllItems.Count());

}

public IEnumerable<KeyValuePair<string, DateTime>> PartOne
{
    get
    {
        for (int i = 10000000; i < 11000000; i++)
        {
            yield return (new KeyValuePair<string, DateTime>(i.ToString(), DateTime.Now));
        }
    }
}
public IEnumerable<KeyValuePair<string, DateTime>> PartTwo
{
    get
    {
        for (int i = 10000000; i < 11000000; i++)
        {
            yield return (new KeyValuePair<string, DateTime>(i.ToString(), DateTime.Now));
        }
    }
}

What is the best way to accomplish this? Note that the use of long.TryParse needs to be present in the solution, as the real input may not include valid Int64's.

Upvotes: 1

Views: 3203

Answers (4)

Dylan Smith
Dylan Smith

Reputation: 22235

I didn't try this yet, but something like this with a group by should work.

items = AllItems.Select(item =>
                {                         
                    long value;                         
                    bool parseSuccess = long.TryParse(item.Key, out value);                         
                    return new { value = value, parseSuccess, item.Value };                     
                })                     
                .Where(parsed => parsed.parseSuccess && !items.ContainsKey(parsed.value))                     
                .Select(parsed => new { parsed.value, parsed.Value })                     
                .GroupBy(x => x.value)
                .Select(x => new {value = x.Key, Value = x.Min(y => y.Value)})
                .ToDictionary(e => e.value, e => e.Value); 

Upvotes: 1

BrokenGlass
BrokenGlass

Reputation: 160852

Let's see - Your Select() is currently projecting to the anonymous type

new { value = value, parseSuccess, item.Value };

Then you filter out all items where parsing failed, so essentially you have

new { value = value, true, item.Value };

Now you use Distinct() on the remaining items. In this case all unique combinations of (value, Value) are considered unique. That means you can have i.e (1,2) and (1,3).

Finally you create your dictionary - but you still may have duplicate value keys as seen in the example above. This explains why you get this exception.

As posted already GroupBy() is the way to go in this case to simplify your expression.

Upvotes: 1

Enigmativity
Enigmativity

Reputation: 117027

I would look cleaning a few things up.

Using a Func<string, long?> is better in a LINQ query.

Func<string, long?> tryParse = t =>
{
    long v;
    if (!long.TryParse(t, out v))
    {
        return null;
    }
    return v;
};

Then the query looks like this:

var query =
    from item in AllItems
    let keyValue = tryParse(item.Key)
    where keyValue.HasValue
    group item.Value by keyValue.Value into g
    select new
    {
        key = g.Key,
        value = g.First(),
    };

And finally create the dictionary:

var items = query.ToDictionary(x => x.key, x => x.value);

Fairly simple.

Thanks for providing all the code required to test the solution.

Upvotes: 4

Ahmad Mageed
Ahmad Mageed

Reputation: 96477

Only the first distinct instance of a key (and the instance's value) should be added to the dictionary.

You can achieve this by using the Enumerable.GroupBy method and taking the first value in the group:

items = AllItems.Select(item =>
                {
                    long value;
                    bool parseSuccess = long.TryParse(item.Key, out value);
                    return new { Key = value, parseSuccess, item.Value };
                })
                .Where(parsed => parsed.parseSuccess)
                .GroupBy(o => o.Key)
                .ToDictionary(e => e.Key, e => e.First().Value)

Upvotes: 5

Related Questions