Dan Hall
Dan Hall

Reputation: 1534

Search all List items for specific duplicate and non duplicate values

I'm looking for a Linq query which can do the following:

  1. For every object in a List check to see if any of the objects have 2 fields set as the same value

  2. For every duplicate set identified check to see if a third field is different for any of them

  3. If #1 and #2 are satisfied then return true (or a + count, just a way to see if the data is duplicated)

Here is an example of the objects that would satisfy the criterium for the required search:

oObject1 {    data1 = "cat",    data2 = "dog",    data3 = "DE" }
oObject2 {    data1 = "cat",    data2 = "dog",    data3 = "FR" }

The following are not to be returned as being 'duplicate':

oObject3 {    data1 = "cat",    data2 = "dog",    data3 = "DE" }
oObject4 {    data1 = "cat",    data2 = "dog",    data3 = "DE" }

So far i can obtain duplicates with the following query:

    var lDuplicates = lstObjects.GroupBy(x => new { x.data1, x.data2})
           .Where(x => x.Skip(1).Any());

What i need is to extend the query above to check for those where data3 is also different. Does anyone have any idea how this might be acheived?

Upvotes: 3

Views: 118

Answers (2)

Me.Name
Me.Name

Reputation: 12544

Using the result you have now, instead of checking if there are more items (skip1-any), you can check if there are any items not equal to data3 of the first item. This can be done a bit easier in query syntax with the help of a assigning a variable with the first data3:

  var lDuplicates = from x in lstObjects
    group x by new { x.data1, x.data2} into g //g now contains groups with unique data1 and data3 objects
    let first = g.First().data3 //assign the first data3 to an intermediate variable
    where g.Skip(1).Any(x=>x.data3 != first) //check if there are any entries that have an deviating data3
    select g;

The above selects all groups that correspond to the criteria (it can be flattened in the same query if required).

But this also means a group can contain 2 "DE"s as long as there is at least one none "DE". Not sure if that is the requirement. To get all objects uniquely (flattened):

  var lDuplicates = from x in lstObjects
    group x by new { x.data1, x.data2} into g //g now contains groups with unique data1 and data3 objects
    let d3 = g.Select(x=>x.data3).Distinct().ToList() //a list of unique data3 properties
    where d3.Count() > 1 //only with more than one unique data3
    from data3 in d3
    select new{g.Key.data1,g.Key.data2, data3}; //create a new object

NB, the above creates a new object, because for multiple matches, which object to use? (it can have more properties than data1,data2 and data3). To select the first object per 'data3'-group:

  var lDuplicates = from x in lstObjects
    group x by new { x.data1, x.data2} into g //g now contains groups with unique data1 and data3 objects
    let d3 = g.GroupBy(x=>x.data3) //create a subgroup for data3 (per group g)
    where d3.Count() > 1 //only with multiple data3
    from gx in d3 //flatten d3 groups
    select gx.First(); //select the first object in the d3 subgroup

Upvotes: 0

vasil oreshenski
vasil oreshenski

Reputation: 2836

You're close. What you need is to expand the sequences in each group and create new groups from them by data3.

When queries tends to get complex i use the query syntax. This might be in the right direction if i get it right.

var queryResult = 
    from obj in lstObjects
    group obj by new { obj.data1, obj.data2 } into outerGroup
    where outerGroup.Skip(1).Any()
    let additionalCheckGroup = (from g in outerGroup
                                group g by g.data3 into innerGroup
                                where innerGroup.Skip(1).Any() == false
                                select innerGroup)
    from innerGroup in additionalCheckGroup
    select new
    {
        outerKey = outerGroup.Key,
        innerKey = innerGroup.Key,
    };

The query will return information about the groups where the dublication by data3 is NOT satisfied, and empty sequence for the rest.

So for the first example: it will yield ->

[0]: { outerKey = {{ data1 = cat, data2 = dog }}, innerKey = "FR" }
[1]: { outerKey = {{ data1 = cat, data2 = dog }}, innerKey = "DE" }

For the second example: it will yield -> empty sequence.

NOTE: The result is flat, this means it will return sequence of elements not groups, i was not sure what result you expected.

Let me know if you have any questions in the comments.

Upvotes: 2

Related Questions