parliament
parliament

Reputation: 22914

Transforming data from flat array into hierarchical structure

I'm looking to do a data transformation from a flat list into a hierarchical structure. How can I accomplish this in a readable way but still acceptable in performance and are there any .NET libraries I can take advantage of. I think this is considered a "facet" in certain terminologies (in this case by Industry).

public class Company
{        
    public int CompanyId { get; set; }
    public string CompanyName { get; set; }
    public Industry Industry { get; set; }
}

public class Industry
{
    public int IndustryId { get; set; }
    public string IndustryName { get; set; }
    public int? ParentIndustryId { get; set; }
    public Industry ParentIndustry { get; set; }
    public ICollection<Industry> ChildIndustries { get; set; }
}

Now let's say I have a List<Company> and I'm looking to transform it into a List<IndustryNode>

//Hierarchical data structure
public class IndustryNode
{
    public string IndustryName{ get; set; }
    public double Hits { get; set; }
    public IndustryNode[] ChildIndustryNodes{ get; set; }
}

So that the resulting object should look like this following after it is serialized:

{
    IndustryName: "Industry",
    ChildIndustryNodes: [
        {
            IndustryName: "Energy",
            ChildIndustryNodes: [
                {
                    IndustryName: "Energy Equipment & Services",
                    ChildIndustryNodes: [
                        { IndustryName: "Oil & Gas Drilling", Hits: 8 },
                        { IndustryName: "Oil & Gas Equipment & Services", Hits: 4 }
                    ]
                },
                {
                    IndustryName: "Oil & Gas",
                    ChildIndustryNodes: [
                        { IndustryName: "Integrated Oil & Gas", Hits: 13 },
                        { IndustryName: "Oil & Gas Exploration & Production", Hits: 5 },
                        { IndustryName: "Oil & Gas Refining & Marketing & Transporation", Hits: 22 }
                    ]
                }
            ]
        },
        {
            IndustryName: "Materials",
            ChildIndustryNodes: [
                {
                    IndustryName: "Chemicals",
                    ChildIndustryNodes: [
                        { IndustryName: "Commodity Chemicals", Hits: 24 },
                        { IndustryName: "Diversified Chemicals", Hits: 66 },
                        { IndustryName: "Fertilizers & Agricultural Chemicals", Hits: 22 },
                        { IndustryName: "Industrial Gases", Hits: 11 },
                        { IndustryName: "Specialty Chemicals", Hits: 43 }
                    ]
                }
            ]
        }
    ]
}

Where "Hits" are the number of companies that fall into that group.

To clarify, I need to transform a List<Company> into a List<IndustryNode> NOT serialize a List<IndustryNode>

Upvotes: 3

Views: 1102

Answers (4)

Ahmed KRAIEM
Ahmed KRAIEM

Reputation: 10427

Try this:

    private static IEnumerable<Industry> GetAllIndustries(Industry ind)
    {
        yield return ind;
        foreach (var item in ind.ChildIndustries)
        {
            foreach (var inner in GetAllIndustries(item))
            {
                yield return inner;
            }
        }
    }

    private static IndustryNode[] GetChildIndustries(Industry i)
    {
        return i.ChildIndustries.Select(ii => new IndustryNode()
        {
            IndustryName = ii.IndustryName,
            Hits = counts[ii],
            ChildIndustryNodes = GetChildIndustries(ii)
        }).ToArray();
    }


    private static Dictionary<Industry, int> counts;
    static void Main(string[] args)
    {
        List<Company> companies = new List<Company>();
        //...
        var allIndustries = companies.SelectMany(c => GetAllIndustries(c.Industry)).ToList();
        HashSet<Industry> distinctInd = new HashSet<Industry>(allIndustries);
        counts = distinctInd.ToDictionary(e => e, e => allIndustries.Count(i => i == e));
        var listTop = distinctInd.Where(i => i.ParentIndustry == null)
                        .Select(i =>  new IndustryNode()
                                {
                                    ChildIndustryNodes = GetChildIndustries(i),
                                    Hits = counts[i],
                                    IndustryName = i.IndustryName
                                }
                        );
    }

untested

Upvotes: 1

CookieOfFortune
CookieOfFortune

Reputation: 13984

Here is some psuedo code that might get you along the way. I create a map/dictionary index and populate it with the company list. Then we extract the top level nodes from the index. Note that there may be edge cases (For example, this index may need to be partially filled initially as it doesn't seem any of your companies ever reference the very top level nodes, so those will have to be filled in some other way).

Dictionary<String, IndustryNode> index = new Dictionary<String, IndustryNode>();

public void insert(Company company)
{ 
    if(index.ContainsKey(company.Industry.IndustryName))
    {
        index[company.Industry.IndustryName].hits++;
    }
    else
    {
        IndustryNode node = new IndustryNode(IndustryName=company.Industry, Hits=1);
        index[node.IndustryName] = node;
        if(index.ContainsKey(company.Industry.ParentIndustry.IndustryName))
        {
            index[company.Industry.ParentIndustry.IndustryName].ChildrenIndustries.Add(node);
        }
    }    
}

List<IndustryNode> topLevelNodes = index
    .Where(kvp => kvp.Item.ParentIndustry == null)
    .ToList(kvp => kvp.Item);

Upvotes: 0

Alexandr
Alexandr

Reputation: 1460

Try to use json serializer for this purpose. I see that you data structure is OK, this is just a matter of serialization.

var industryNodeInstance = LoadIndustryNodeInstance();

var json = new JavaScriptSerializer().Serialize(industryNodeInstance);

If you want to choose between serializers please see this: http://www.servicestack.net/benchmarks/#burningmonk-benchmarks

LoadIndustryNodeInstance method

  • Build List<Industry>

  • Convert IndustryTree = List<IndustryNode>

  • Implement Tree methods, such Traverse. Try to look at Tree data structure in C#

Upvotes: 0

CodeChops
CodeChops

Reputation: 2058

You are looking for a serializer. MSFT has one that is native to VS, but I like Newtonsofts, which is free. MSFT documentation and examples are here, Newtonsoft documentation is here.

Newtonsoft is free, easy and faster.

Upvotes: 0

Related Questions