Reputation: 1534
I'm looking for a Linq query which can do the following:
For every object in a List check to see if any of the objects have 2 fields set as the same value
For every duplicate set identified check to see if a third field is different for any of them
If #1 and #2 are satisfied then return true (or a + count, just a way to see if the data is duplicated)
Here is an example of the objects that would satisfy the criterium for the required search:
oObject1 { data1 = "cat", data2 = "dog", data3 = "DE" }
oObject2 { data1 = "cat", data2 = "dog", data3 = "FR" }
The following are not to be returned as being 'duplicate':
oObject3 { data1 = "cat", data2 = "dog", data3 = "DE" }
oObject4 { data1 = "cat", data2 = "dog", data3 = "DE" }
So far i can obtain duplicates with the following query:
var lDuplicates = lstObjects.GroupBy(x => new { x.data1, x.data2})
.Where(x => x.Skip(1).Any());
What i need is to extend the query above to check for those where data3 is also different. Does anyone have any idea how this might be acheived?
Upvotes: 3
Views: 118
Reputation: 12544
Using the result you have now, instead of checking if there are more items (skip1-any), you can check if there are any items not equal to data3
of the first item.
This can be done a bit easier in query syntax with the help of a assigning a variable with the first data3
:
var lDuplicates = from x in lstObjects
group x by new { x.data1, x.data2} into g //g now contains groups with unique data1 and data3 objects
let first = g.First().data3 //assign the first data3 to an intermediate variable
where g.Skip(1).Any(x=>x.data3 != first) //check if there are any entries that have an deviating data3
select g;
The above selects all groups that correspond to the criteria (it can be flattened in the same query if required).
But this also means a group can contain 2 "DE"s as long as there is at least one none "DE". Not sure if that is the requirement. To get all objects uniquely (flattened):
var lDuplicates = from x in lstObjects
group x by new { x.data1, x.data2} into g //g now contains groups with unique data1 and data3 objects
let d3 = g.Select(x=>x.data3).Distinct().ToList() //a list of unique data3 properties
where d3.Count() > 1 //only with more than one unique data3
from data3 in d3
select new{g.Key.data1,g.Key.data2, data3}; //create a new object
NB, the above creates a new object, because for multiple matches, which object to use? (it can have more properties than data1,data2 and data3). To select the first object per 'data3'-group:
var lDuplicates = from x in lstObjects
group x by new { x.data1, x.data2} into g //g now contains groups with unique data1 and data3 objects
let d3 = g.GroupBy(x=>x.data3) //create a subgroup for data3 (per group g)
where d3.Count() > 1 //only with multiple data3
from gx in d3 //flatten d3 groups
select gx.First(); //select the first object in the d3 subgroup
Upvotes: 0
Reputation: 2836
You're close. What you need is to expand the sequences in each group and create new groups from them by data3.
When queries tends to get complex i use the query syntax. This might be in the right direction if i get it right.
var queryResult =
from obj in lstObjects
group obj by new { obj.data1, obj.data2 } into outerGroup
where outerGroup.Skip(1).Any()
let additionalCheckGroup = (from g in outerGroup
group g by g.data3 into innerGroup
where innerGroup.Skip(1).Any() == false
select innerGroup)
from innerGroup in additionalCheckGroup
select new
{
outerKey = outerGroup.Key,
innerKey = innerGroup.Key,
};
The query will return information about the groups where the dublication by data3 is NOT satisfied, and empty sequence for the rest.
So for the first example: it will yield ->
[0]: { outerKey = {{ data1 = cat, data2 = dog }}, innerKey = "FR" }
[1]: { outerKey = {{ data1 = cat, data2 = dog }}, innerKey = "DE" }
For the second example: it will yield -> empty sequence.
NOTE: The result is flat, this means it will return sequence of elements not groups, i was not sure what result you expected.
Let me know if you have any questions in the comments.
Upvotes: 2