user1344280
user1344280

Reputation:

Intersection of two lists with repetitions

I'm trying to create a function that will give me the intersection of two lists, taking into account that there can be repeated items and I need them in the output.

Console.Write((new[] {1, 2, 2, 3}).Intersect(new[] {1, 2, 2}));

only outputs {1, 2} and what I need the output to be is {1, 2, 2}.

Here is the method I have created:

private static IEnumerable<int> IntersectWithRepetitons(List<int> a, List<int> b)
{
    if (!a.Any() || !b.Any()) return Enumerable.Empty<int>();
    if (a.Count() > b.Count()) return IntersectWithRepetitons(b, a);

    var idx = b.IndexOf(a.First());
    if (idx < 0) return IntersectWithRepetitons(b, a.Skip(1).ToList());

    var tmp1 = new List<int> { a.First() };
    var tmp2 = new List<int>(b);
    tmp2.RemoveAt(idx);
    return tmp1.Concat(IntersectWithRepetitons(tmp2, a.Skip(1).ToList()));
}

I'm sure this can be highly optimized but, my main concern (efficiency wise) is that in order to keep the input lists intact, I have to duplicate the 'b' list when I remove a found item from it:

var tmp2 = new List<int>(b);
tmp2.RemoveAt(idx);

and that will happen for every recursive call. Any thoughts or ideas will be very appreciated. Thanks.

Upvotes: 0

Views: 744

Answers (1)

Servy
Servy

Reputation: 203821

Map one of the sequences to a dictionary of items to the count of times they appear, then for each item in the other sequence, if it's in the collection, and the value of the lookup is greater than zero, yield it and decriment:

public static IEnumerable<T> IntersectWithRepetitons<T>(this IEnumerable<T> first,
    IEnumerable<T> second)
{
    var lookup = second.GroupBy(x => x)
        .ToDictionary(group => group.Key, group => group.Count());
    foreach (var item in first)
        if (lookup.ContainsKey(item) && lookup[item] > 0)
        {
            yield return item;
            lookup[item]--;
        }
}

This ensures that items are yields for each time they are duplicated in both sets.

You could use TryGetValue to remove a few dictionary lookups, but it sacrifices a lot of the method's elegance, so I just didn't have it in me to do that. If you care about performance, it's not a bad thing to change.

Upvotes: 5

Related Questions