E.Beach
E.Beach

Reputation: 1857

Linq query that reduces a subset of duplicates to a single value within a larger set?

Is there a linq command that will filter out duplicates that appear in a sequence?

Example with '4':

Original { 1 2 3 4 4 4 5 6 7 4 4 4 8 9 4 4 4 }
Filtered { 1 2 3 4 5 6 7 4 8 9 4 }

Thanks.

Upvotes: 5

Views: 422

Answers (5)

LukeH
LukeH

Reputation: 269388

If you're using .NET 4 then you can do this using the built-in Zip method, although I'd probably prefer to use a custom extension method like the one shown in mquander's answer.

// replace "new int[1]" below with "new T[1]" depending on the type of element
var filtered = original.Zip(new int[1].Concat(original),
                            (l, r) => new { L = l, R = r })
                       .Where((x, i) => (i == 0) || !object.Equals(x.L, x.R))
                       .Select(x => x.L);

Upvotes: 0

Cheng Chen
Cheng Chen

Reputation: 43523

Yes there is! One-line code and one loop of the array.

int[] source = new int[] { 1, 2, 3, 4, 4, 4, 5, 6, 7, 4, 4, 4, 8, 9, 4, 4, 4 };
var result = source.Where((item, index) => index + 1 == source.Length 
                          || item != source[index + 1]);

And according to @Hogan's advice, it can be better:

var result = source.Where((item, index) => index == 0 
                          || item != source[index - 1]);

More readable now i think. It means "choose the first element, and those which isn't equal to the previous one".

Upvotes: 4

Rei Miyasaka
Rei Miyasaka

Reputation: 7106

Similar to svick's answer, except with side effects to avoid the cons and reverse:

int[] source = new int[] { 1, 2, 3, 4, 4, 4, 5, 6, 7, 4, 4, 4, 8, 9, 4, 4, 4 };

List<int> result = new List<int> { source.First() };
source.Aggregate((acc, c) =>
    {
        if (acc != c)
            result.Add(c);
        return c;
    });

Edit: No longer needs the source.First() as per mquander's concern:

int[] source = new int[] { 1, 2, 3, 4, 4, 4, 5, 6, 7, 4, 4, 4, 8, 9, 4, 4, 4 };

List<int> result = new List<int>();
result.Add(
    source.Aggregate((acc, c) =>
    {
        if (acc != c)
            result.Add(acc);
        return c;
    })
);

I think I still like Danny's solution the most.

Upvotes: 3

svick
svick

Reputation: 244827

You can use Aggregate() (although I'm not sure whether it's better than the non-LINQ solution):

var ints = new[] { 1, 2, 3, 4, 4, 4, 5, 6, 7, 4, 4, 4, 8, 9, 4, 4, 4 };

var result = ints.Aggregate(
    Enumerable.Empty<int>(),
    (list, i) =>
        list.Any() && list.First() == i
        ? list
        : new[] { i }.Concat(list)).Reverse();

I think it's O(n), but I'm not completely sure.

Upvotes: 1

mqp
mqp

Reputation: 71945

Not really. I'd write this:

public static IEnumerable<T> RemoveDuplicates(this IEnumerable<T> sequence)
{
    bool init = false;
    T current = default(T);

    foreach (var x in sequence)
    {
        if (!init || !object.Equals(current, x))
            yield return x;

        current = x;
        init = true;
    }   
}

Upvotes: 5

Related Questions