Reputation: 22914
I'm looking to do a data transformation from a flat list into a hierarchical structure. How can I accomplish this in a readable way but still acceptable in performance and are there any .NET libraries I can take advantage of. I think this is considered a "facet" in certain terminologies (in this case by Industry).
public class Company
{
public int CompanyId { get; set; }
public string CompanyName { get; set; }
public Industry Industry { get; set; }
}
public class Industry
{
public int IndustryId { get; set; }
public string IndustryName { get; set; }
public int? ParentIndustryId { get; set; }
public Industry ParentIndustry { get; set; }
public ICollection<Industry> ChildIndustries { get; set; }
}
Now let's say I have a List<Company>
and I'm looking to transform it into a List<IndustryNode>
//Hierarchical data structure
public class IndustryNode
{
public string IndustryName{ get; set; }
public double Hits { get; set; }
public IndustryNode[] ChildIndustryNodes{ get; set; }
}
So that the resulting object should look like this following after it is serialized:
{
IndustryName: "Industry",
ChildIndustryNodes: [
{
IndustryName: "Energy",
ChildIndustryNodes: [
{
IndustryName: "Energy Equipment & Services",
ChildIndustryNodes: [
{ IndustryName: "Oil & Gas Drilling", Hits: 8 },
{ IndustryName: "Oil & Gas Equipment & Services", Hits: 4 }
]
},
{
IndustryName: "Oil & Gas",
ChildIndustryNodes: [
{ IndustryName: "Integrated Oil & Gas", Hits: 13 },
{ IndustryName: "Oil & Gas Exploration & Production", Hits: 5 },
{ IndustryName: "Oil & Gas Refining & Marketing & Transporation", Hits: 22 }
]
}
]
},
{
IndustryName: "Materials",
ChildIndustryNodes: [
{
IndustryName: "Chemicals",
ChildIndustryNodes: [
{ IndustryName: "Commodity Chemicals", Hits: 24 },
{ IndustryName: "Diversified Chemicals", Hits: 66 },
{ IndustryName: "Fertilizers & Agricultural Chemicals", Hits: 22 },
{ IndustryName: "Industrial Gases", Hits: 11 },
{ IndustryName: "Specialty Chemicals", Hits: 43 }
]
}
]
}
]
}
Where "Hits" are the number of companies that fall into that group.
To clarify, I need to transform a List<Company>
into a List<IndustryNode>
NOT serialize a List<IndustryNode>
Upvotes: 3
Views: 1102
Reputation: 10427
Try this:
private static IEnumerable<Industry> GetAllIndustries(Industry ind)
{
yield return ind;
foreach (var item in ind.ChildIndustries)
{
foreach (var inner in GetAllIndustries(item))
{
yield return inner;
}
}
}
private static IndustryNode[] GetChildIndustries(Industry i)
{
return i.ChildIndustries.Select(ii => new IndustryNode()
{
IndustryName = ii.IndustryName,
Hits = counts[ii],
ChildIndustryNodes = GetChildIndustries(ii)
}).ToArray();
}
private static Dictionary<Industry, int> counts;
static void Main(string[] args)
{
List<Company> companies = new List<Company>();
//...
var allIndustries = companies.SelectMany(c => GetAllIndustries(c.Industry)).ToList();
HashSet<Industry> distinctInd = new HashSet<Industry>(allIndustries);
counts = distinctInd.ToDictionary(e => e, e => allIndustries.Count(i => i == e));
var listTop = distinctInd.Where(i => i.ParentIndustry == null)
.Select(i => new IndustryNode()
{
ChildIndustryNodes = GetChildIndustries(i),
Hits = counts[i],
IndustryName = i.IndustryName
}
);
}
untested
Upvotes: 1
Reputation: 13984
Here is some psuedo code that might get you along the way. I create a map/dictionary index and populate it with the company list. Then we extract the top level nodes from the index. Note that there may be edge cases (For example, this index may need to be partially filled initially as it doesn't seem any of your companies ever reference the very top level nodes, so those will have to be filled in some other way).
Dictionary<String, IndustryNode> index = new Dictionary<String, IndustryNode>();
public void insert(Company company)
{
if(index.ContainsKey(company.Industry.IndustryName))
{
index[company.Industry.IndustryName].hits++;
}
else
{
IndustryNode node = new IndustryNode(IndustryName=company.Industry, Hits=1);
index[node.IndustryName] = node;
if(index.ContainsKey(company.Industry.ParentIndustry.IndustryName))
{
index[company.Industry.ParentIndustry.IndustryName].ChildrenIndustries.Add(node);
}
}
}
List<IndustryNode> topLevelNodes = index
.Where(kvp => kvp.Item.ParentIndustry == null)
.ToList(kvp => kvp.Item);
Upvotes: 0
Reputation: 1460
Try to use json serializer for this purpose. I see that you data structure is OK, this is just a matter of serialization.
var industryNodeInstance = LoadIndustryNodeInstance();
var json = new JavaScriptSerializer().Serialize(industryNodeInstance);
If you want to choose between serializers please see this: http://www.servicestack.net/benchmarks/#burningmonk-benchmarks
LoadIndustryNodeInstance method
Build List<Industry>
Convert IndustryTree = List<IndustryNode>
Implement Tree methods, such Traverse. Try to look at Tree data structure in C#
Upvotes: 0