Robert
Robert

Reputation: 8609

Graph algorithm - Looking to improve scalability

I've written an algorithm which calculates and stores all paths of a DAG, and it works nicely on small graphs - but now i'm looking to improve it's efficiency to run over larger graphs. The core logic of the algorithm is in createSF() and makePathList() below, the other methods are helpers - I can see that append is a bottleneck. However, I guess the biggest help would be to devise a data structure that can store paths in a dictionary, since many of the paths are composed of other paths, this is the crux of my question.

private Multiset<String> paths = new Multiset<String>();    

public Multiset<String> createSF(DAGNode n) {

    for (DAGNode succ : n.getSuccessors())
        createSF(succ);
    if (!n.isVisited())
        for (String s : makePathList(n)) 
            paths.put(s);

    n.setVisited(true);
    return paths;
}

private List<String> makePathList(DAGNode n) {
    List<String> list = new ArrayList<String>();

    list.add(n.getLabel());
    for (DAGNode node : n.getSuccessors())
        list.addAll(append(n.getLabel(), makePathList(node)));

return list;
}

private List<String> append(String s, List<String> src) {
    List<String> ls = new ArrayList<String>();
    for (String str : src) 
    ls.add(s + "/" + str);

    return ls;
}

EDIT:

I'm now using a path object to represent paths and have pin-pointed the bottle neck to these two methods:

public List<Path> createPathList(Tree n) {
    List<Path> list = new ArrayList<Path>();
    list.add(new Path(n.getNodeName()));
    for (Tree node : n.getSuccessors()) {
        list.addAll(append(n.getNodeName(), createPathList(node)));
    }
    return list;
}

public List<Path> append(String s, List<Path> src) {
    List<Path> ls = new ArrayList<Path>();
    for (Path path : src) {
        ls.add(new Path(path, s));
    }
    return ls;
}

Trouble is when a graph is size M these methods will be called M times, this means there is a lot of lists being created here... is there a more efficient way to build up the return for createPathList()?

Upvotes: 1

Views: 537

Answers (4)

Martin
Martin

Reputation: 12403

A simple modification (depending on how you use the data) might be to load the paths lazily, that way if you tend to only use a few paths you'll never even generate some paths.

Of course, this depends entirely on expected use

Upvotes: 0

Andreas Dolk
Andreas Dolk

Reputation: 114767

Please allow me to put two (hopefully helpful) comments first:

I had some difficulties understanding your code, because some of the method names misled me. From looking at the names, I expected something else. May I suggest some refactoring:

makePathList -> createPathList  // you actually create a List here
append -> createPathList // yes, same name as above because it creates the same
                         // type of list, just with different parameters

(removed part of the answer that became obsolete after Robert's edit)

As Margus said, replacing the String concatenation with a StringBuilder append chain doesn't increase your performance. Compilers may optimize String concatenations to StringBuilder appends anyway (I've seen such byte code).

You could try to convert the DAG into a tree structure. Introduce an invisible root with all nodes as direct children. Now for each node add it's successors (child and/or sibling). The number of leaves now should be equal to the number of path and every graph from the root to any leaf is one path in the DAG.

Edit

A small improvement - it's micro-optimization but at least it will leave less garbage:

private List<String> append(String node, List<String> allPathsStartingAfterNode) {
    List<String> allPathsStartingAtNode = new ArrayList<String>();
    String nodeWithSeparator = node + "/";

    for (String aPathStartingAfterNode : allPathsStartingAfterNode) {
        allPathsStartingAtNode.add(nodeWithSeparator + aPathStartingAfterNode);
    }

    return allPathsStartingAtNode;
}

Upvotes: 0

Anna
Anna

Reputation: 4149

In order to answer this question, it is necessary to understand why do you need the list of paths. The list of paths does not give you any additional information over what you have in the DAG representation.

If you want to calculate things for each path separately, or calculate something like sum/min/max over all paths, it too could be done using the DAG itself.

If you do insist on saving separate paths, one option would be to convert your DAG into a variant of a Trie. Another option could be to use some variant of the Lempel-Ziv representation. It depends on your DAG types, and what you expect to do with the paths information.

Upvotes: 5

Robert S. Barnes
Robert S. Barnes

Reputation: 40558

Take a look at the DOT source code from Graphviz, it might give you some ideas.

Upvotes: 0

Related Questions