Reputation: 141
I am looking to write linq
statement for a simple scenario of collections. I am trying to avoid duplicate items in collection based on parent child relationship. The data structure and sample code is below
public class Catalog
{
public int CatalogId { get; set; }
public int ParentCatalogId { get; set; }
public string CatalogName { get; set; }
}
public class Model
{
public int CatalogId { get; set; }
public string ItemName { get; set; }
...
}
List<Catalog> Catalogs
: Contains the complete list of parent child relations to any level of all the catalogs and the root one with ParentCatalogid=null
List<Model> CollectionA
: Contains all the items of child as well as parent catalog for a specific catalogId (till its root).
I need to create a CollectionB from CollectionA that will contain items of the provided catalogId including all the items of all the parents such that if item is present in child catalog, i need to ignore same item in parent catalog. In this way there wont be any duplicate Items if same items is available in child as well as parent.
In terms of code I am trying to achieve something like this
while (catalogId!= null)
{
CollectionB.AddRange(
CollectionA.Where(x => x.CatalogId == catalogId &&
!CollectionB.Select(y => y.ItemName).Contains(x.ItemName)));
// Starting from child to parent and ignoring items that are already in CollectionB
catalogId = Catalogs.
Where(x => x.Id == catalogId).
Select(x => x.ParentCatalogId).
FirstOrDefault();
}
I know that Contains clause in linq in above statement will not work but just put that statement to explain what i am trying to do. I can do that using foreach
loop but just want to use linq
. I am looking for correct linq statement to do this. The sample data is given below and will really appreciate if i can get some help
Catalog
ID ParenId CatalogName
1 null CatalogA
2 1 Catalogb
3 1 CatalogC
4 2 CatalogD
5 4 CatalogE
CollectionA
CatalogId ItemName
5 ItemA
5 ItemB
4 ItemA
4 ItemC
2 ItemA
2 ItemC
1 ItemD
Expected output
CollectionB
5 ItemA
5 ItemB
4 ItemC
1 ItemD
Upvotes: 1
Views: 998
Reputation: 10401
LINQ is not designed to traverse hierarchical data structures as it has been already considered in:
But if you can get the hierarchy of catalogs from child to root then the problem could be solved with join and distinct - LINQ's Distinct() on a particular property :
var modelsForE = (from catalog in flattenedHierarchyOfCatalogE
join model in models
on catalog.CatalogId equals model.CatalogId
select model).
GroupBy(model => model.ItemName).
Select(modelGroup => modelGroup.First()).
Distinct();
Or even better - adapt Jon Skeet's answer for distinct.
It solves the duplicates problem but leaves us with another question : How to get flattenedHierarchyOfCatalogE
?
PURE LINQ SOLUTION:
It is not easy task, but not exactly impossible with pure LINQ. Adapting How to search Hierarchical Data with Linq we get:
public static class LinqExtensions
{
public static IEnumerable<T> Flatten<T>(this T source, Func<T, IEnumerable<T>> selector)
{
return selector(source).SelectMany(c => Flatten(c, selector))
.Concat(new[] { source });
}
}
//...
var catalogs = new Catalog[]
{
new Catalog(1, 0, "CatalogA"),
new Catalog(2, 1, "Catalogb"),
new Catalog(3, 1, "CatalogC"),
new Catalog(4, 2, "CatalogD"),
new Catalog(5, 4, "CatalogE")
};
var models = new Model[]
{
new Model(5, "ItemA"),
new Model(5, "ItemB"),
new Model(4, "ItemA"),
new Model(4, "ItemC"),
new Model(2, "ItemA"),
new Model(2, "ItemC"),
new Model(1, "ItemD")
};
var catalogE = catalogs.SingleOrDefault(catalog => catalog.CatalogName == "CatalogE");
var flattenedHierarchyOfCatalogE = catalogE.Flatten((source) =>
catalogs.Where(catalog =>
catalog.CatalogId == source.ParentCatalogId));
And then feed the flattenedHierarchyOfCatalogE
into the query from the beginning of the question.
WARNING: I have added constructors for your classes, so previous snippet may fail to compile in your project:
public Catalog(Int32 catalogId, Int32 parentCatalogId, String catalogName)
{
this.CatalogId = catalogId;
this.ParentCatalogId = parentCatalogId;
this.CatalogName = catalogName;
} //...
SOMETHING TO CONSIDER
There is nothing wrong with previous solution(well, personally I may have considered to use something with less extensive use of LINQ like Recursive Hierarchy - Recursive Query using Linq), but whichever solution you like you may have one problem: It works, but it doesn't use any optimized datastructures - it is just direct search and selection. If your catalogs grow and queries will execute more often, then the performance may become a problem.
But even if the performance is not a problem then the ease of use of your classes is. Ids, foreign keys are good for relational databases but very unwieldy in OO systems. You may want to consider possible object relational mapping for your classes(or creation of their wrappers(mirrors) that will look something like:
public class Catalog
{
public Catalog Parent { get; set; }
public IEnumerable<Catalog> Children { get; set; }
public string CatalogName { get; set; }
}
public class Model
{
public Catalog Catalog { get; set; }
public string ItemName { get; set; }
}
Such classes are far more self contained and much more easier to use and to traverse their hierarchies. I don't know whether your system is database-driven or not, but you can nonetheless take a look at some object-relational mapping examples and technologies.
P.S.: LINQ is not an absolute tool in .NET arsenal. No doubts that it is very useful tool applicable in multitude of situations, but not in each of all possible. And if tool cannot help you to solve a problem, then it should be either modified or put aside for a moment.
Upvotes: 1
Reputation: 1730
You are most likely looking for SelectMany()
extension. A short example of how it can be used to select all the children for comparison (to avoid duplicates) is below:
var col = new[] {
new { name = "joe", children = new [] {
new { name = "billy", age=1 },
new { name = "sally", age=4 }
}},
new { name = "bob", children = new [] {
new { name = "megan", age=10 },
new { name = "molly", age=7 }
}}
};
col.SelectMany(c => c.children).Dump("kids");
For more information there are a few questions on stack overflow about this extension and of course you can read the actual msdn documentation
Upvotes: 0