Iulian
Iulian

Reputation: 1536

Speeding up Titan graph traversals

I have been playing around with a very small graph of 264,346 vertices and 733,846 edges. I bulk imported it into Titan using BatchGraph and the recommended settings. This worked out fine - it is surprisingly efficient.

I am now playing around with graph traversals and implemented a quick Dijkstra's algorithm in Java using the Java libraries. I should mention I have a local Cassandra node that my Titan database is running on - it is not the embedded one.

An average Dijkstra run takes 1+ minute for an average path in this graph. This is extremely long. I would expect a very slow Dijkstra run on such a graph to take under 10 seconds (with an in-memory graph average queries on this graph size would take well under 1 second).

What are the best practices for running such algorithms over Titan efficiently?

I will give parts of the simple Dijkstra implementation in case the way I am accessing vertices and edges is not the most efficient way.

Getting the graph instance:

TitanGraph graph = TitanFactory.open("cassandra:localhost");

Parts of the Dijkstra implementation (specifically involving graph access):

public int run(long src, long trg){
    this.pqueue.add(new PQNode(0, src));
    this.nodeMap.put(src, new NodeInfo(src, 0, -1));
    int dist = Integer.MAX_VALUE;
    while (!this.pqueue.isEmpty()){
        PQNode current = this.pqueue.poll();
        NodeInfo nodeInfo = this.nodeMap.get(current.getNodeId());
        long u = nodeInfo.getNodeId();

        if (u == trg) {
            dist = nodeInfo.getDistance();
            break;
        }

        if (nodeInfo.isSeen())
            continue;

        this.expansion++;

        TitanVertex vertex = graph.getVertex(u);
        for (TitanEdge out: vertex.getEdges()){
            Direction dir = out.getDirection(vertex);
            if (dir.compareTo(Direction.OUT) != 0 && dir.compareTo(Direction.BOTH) != 0){
                continue;
            }

            TitanVertex v = out.getOtherVertex(vertex);
            long vId = (long)v.getId();
            NodeInfo vInfo = this.nodeMap.get(vId);
            if (vInfo == null){
                vInfo = new NodeInfo(vId, Integer.MAX_VALUE, -1);
                this.nodeMap.put(vId, vInfo);
            }
            int weight = out.getProperty("weight");
            int currentDistance = nodeInfo.getDistance() + weight;
            if (currentDistance < vInfo.getDistance()){
                vInfo.setParent(u);
                vInfo.setDistance(currentDistance);
                this.pqueue.add(new PQNode(currentDistance, vId));
            }
        }
        nodeInfo.setSeen(true);
    }


    return dist;
}

How should I proceed trying to execute such algorithms efficiently over Titan?

Upvotes: 1

Views: 425

Answers (1)

stephen mallette
stephen mallette

Reputation: 46206

Forgetting about your code/algorithm, I have to agree that as you say, your graph is "very small". That begs the question why you chose to use cassandra as your backend. For a graph of that size, I would try out berkeleydb to see if that helps. Or perhaps if you want to go really extreme just use the fastest Blueprints implementation there is: TinkerGraph!

As for the code sample itself, one suggestion might be to stop iterating all edges on a vertex if you don't need to:

for (TitanEdge out: vertex.getEdges()){
    Direction dir = out.getDirection(vertex);
    if (dir.compareTo(Direction.OUT) != 0 && dir.compareTo(Direction.BOTH) != 0){
            continue;
    }

Instead if you just want in" edges tell Titan that: vertex.getEdges(Direction.IN). That should give you some extra efficiency there - I'd even do that if you decided to just load to TinkerGraph. That would also eliminate the need for that getOtherVertex line because then you know to just out.getVertex(Direction.OUT) which I'd guess to be faster.

Upvotes: 0

Related Questions