Phil-ZXX
Phil-ZXX

Reputation: 3265

Merge std::unordered_map iteratively

I have a list of nodes that each decompose into more nodes. For example

Therefore, we have Node0 = w01*w12 * Node2 + w03 * Node3 + w01*w14 Node4.


My C++ code for performing the above aggregation/decomposition/merging for a given set of weight decompositions looks as follows. However, I feel there are a lot of optimisations to be made. To name just one, I am looping over the keys of topWeights and collect them in topNodeNames, which seems terribly inefficient.

Are there any STL algorithms that could help me speed this up, and possibly avoid unnecessary copying?

#include <string>
#include <unordered_map>

template<class T, class U> using umap = std::unordered_map<T, U>;


umap<std::string, double> getWeights(const std::string& nodeName, const umap<std::string, umap<std::string, double>>& weightTrees)
{
    const auto it = weightTrees.find(nodeName);
    if (it == weightTrees.end())
        return umap<std::string, double>();

    umap<std::string, double> topWeights = it->second;
    std::vector<std::string> topNodeNames;

    for (const auto& kv : topWeights)
        topNodeNames.push_back(kv.first);

    for (const std::string& topNodeName : topNodeNames)
    {
        umap<std::string, double> subWeights = getWeights(topNodeName, weightTrees);
        if (subWeights.size() > 0)
        {
            const double topWeight = topWeights[topNodeName];
            topWeights.erase(topNodeName);
            for (const auto& subWeight : subWeights)
            {
                const auto it = topWeights.find(subWeight.first);
                if (it == topWeights.end())
                    topWeights[subWeight.first] = topWeight * subWeight.second;
                else
                    it->second += topWeight * subWeight.second;
            }
        }
    }

    return topWeights;
}


int main()
{
    umap<std::string, umap<std::string, double>> weightTrees = {{ "Node0", {{ "Node1",0.5 },{ "Node2",0.3 },{ "Node3",0.2 }} },
                                                                { "Node1", {{ "Node2",0.1 },{ "Node4",0.9 }} }};

    umap<std::string, double> w = getWeights("Node0", weightTrees); // gives {Node2: 0.35, Node3: 0.20, Node4: 0.45}
}

Upvotes: 3

Views: 798

Answers (2)

btilly
btilly

Reputation: 46408

I would suggest doing a topological sort followed by a dynamic programming algorithm. Standard versions of a topological sort using Khan's algorithm take time O(V+E). (If that link goes stale, you can just use Google to find another.) In your case V is the number of nodes, and E is the number of terms that appear in all of your expressions.

If that sort fails, then you have found a circular dependency. Discovering it that way is better than having your code blow up.

Once you have that sort, then going from the end to the front with DP is very straightforward.

Also if you're truly concerned with performance, one of your performance constraints is that every operation is done using string comparisons. Throwing around lots of strings is easy and convenient - that's why scripting languages do it all the time. However it is also slow. I have found it worthwhile in the past to create a lookup structure that turns strings into indexes before entering performance critical code, then be throwing around some type of int instead of a string. And then at the end use the lookup to turn it back into strings.

Upvotes: 2

Max Langhof
Max Langhof

Reputation: 23691

The main problem is that you are recursing for every node to every subnode, which is generally highly redundant. One way to avoid this would be to introduce an order on the node names, where "higher" nodes depend only on "lower" nodes and then calculate them in reverse order (for each node you'll already know all child weights exactly). However, I don't think there are std algorithms that will find this order for you because you can't transiently determine node dependencies cheaply ("does node X depend on node Y? If it's not directly, we might have to search the entire tree...").

So, you could just go the dynamic programming route and store nodes that you have fully calculated somewhere. Or even better - you could just flatten the entire tree down to leaf-only weights as you traverse it. As long as you retain the flattening throughout the recursion, this is actually quite elegant in recursive form:

using NodeWeights = std::unordered_map<std::string, double>;
using NonLeaves = std::unordered_map<std::string, NodeWeights>;

// Modifies the tree so that the given root has no non-leaf children.
void flattenTree(std::string root, NonLeaves& toFlatten)
{
    auto rootIt = toFlatten.find(root);
    if (rootIt == toFlatten.end())
        return;

    NodeWeights& rootWeights = rootIt->second;

    NodeWeights leafOnlyWeights;

    for (auto kvp : rootWeights)
    {
        const std::string& childRoot = kvp.first;
        double childWeight = kvp.second;

        std::cout << "Checking child " << childRoot << std::endl;

        // If the graph is indeed acyclic, then the root kvp here is untouched
        // by this call (and thus references to it are not invalidated).
        flattenTree(childRoot, toFlatten);

        auto childIt = toFlatten.find(childRoot);

        // The child is a leaf after flattening: Do not modify anything.
        if (childIt == toFlatten.end())
        {
            leafOnlyWeights[childRoot] = childWeight;
            continue;
        }

        // Child is still not a leaf (but all its children are now leaves):
        // Redistribute its weight among our other child weights.
        const NodeWeights& leafWeights = childIt->second;
        for (auto leafKvp : leafWeights)
            leafOnlyWeights[leafKvp.first] += childWeight * leafKvp.second;
    }

    rootWeights = leafOnlyWeights;
}

int main()
{
    umap<std::string, umap<std::string, double>> weightTrees = {{ "Node0", {{ "Node1",0.5 },{ "Node2",0.3 },{ "Node3",0.2 }} },
                                                                { "Node1", {{ "Node2",0.1 },{ "Node4",0.9 }} }};

    auto flattenedTree = weightTrees;
    flattenTree("Node0", flattenedTree);

    umap<std::string, double> w = flattenedTree["Node0"]; // Should give {Node2: 0.35, Node3: 0.20, Node4: 0.45}

    for (auto kvp : w)
      std::cout << kvp.first << ": " << kvp.second << std::endl;
}

Demo

Since each node is flattened at most once, you cannot run into the exponential runtime your original algorithm has.

Upvotes: 2

Related Questions