Reputation: 3
I have a graph database with 150 million nodes and a few hundred million relationships.
There are two types of nodes in the network: account node and transaction node. Each account node has a public key and each transaction node has a number (the amount of total bitcoin involved in this transaction).
There are also two types of relationships in the network. Each relationship connects an account node with a transaction node. One type of relationships is "send" and the other type is "receive". Each relationship also has a number to represent how much bitcoin it sends or receives.
This is an example:
(account: publickey = A)-[send: bitcoin=1.0]->(transaction :id = 1, Tbitcoin=1.0)-[receive: bitcoin=0.5]->(account: publickey = B)
(account: publickey = A)-[send: bitcoin=1.0]->(transaction :id = 1, Tbitcoin=1.0)-[receive: bitcoin=0.5]->(account: publickey = C)
As you can imagine, B
or C
can also send or receive bitcoins to or from other accounts which involves many different transactions.
What I wants to do is to find all paths with depth equaling to 4 between two accounts, e.g. A
and C
. I can do this by Cypher although it is slow. It takes about 20mins. My cypher is like this:
start src=node:keys(PublicKey="A"),dest=node:keys(PublicKey="C")
match p=src-->(t1)-->(r1)-->(t2)-->dest
return count(p);
However, when I try to do that using Java API, I got the OutOfMemoryError
. Here is my function:
public ArrayList<Path> getPathsWithConditionsBetweenNodes(String indexName, String sfieldName, String sValue1, String sValue2,
int depth, final double threshold, String relType){
ArrayList<Path> res = null;
if (isIndexExistforNode(indexName)) {
try (Transaction tx = graphDB.beginTx()) {
IndexManager index = graphDB.index();
Index<Node> accounts = index.forNodes(indexName);
IndexHits<Node> hits = null;
hits = accounts.get(sfieldName, sValue1);
Node src = null, dest = null;
if(hits.iterator().hasNext())
src = hits.iterator().next();
hits = null;
hits = accounts.get(sfieldName, sValue2);
if(hits.iterator().hasNext())
dest = hits.iterator().next();
if(src==null || dest==null){
System.out.println("Either src or dest node is not avaialble.");
}
TraversalDescription td = graphDB.traversalDescription()
.depthFirst();
if (relType.equalsIgnoreCase("send")) {
td = td.relationships(Rels.Send, Direction.OUTGOING);
td = td.relationships(Rels.Receive, Direction.OUTGOING);
} else if (relType.equalsIgnoreCase("receive")) {
td= td.relationships(Rels.Receive,Direction.INCOMING);
td = td.relationships(Rels.Send,Direction.INCOMING);
} else {
System.out
.println("Traverse Without Type Constrain Because Unknown Relationship Type is Provided to The Function.");
}
td = td.evaluator(Evaluators.includingDepths(depth, depth))
.uniqueness(Uniqueness.RELATIONSHIP_PATH)
.evaluator(Evaluators.returnWhereEndNodeIs(dest));
td = td.evaluator(new Evaluator() {
@Override
public Evaluation evaluate(Path path) {
if (path.length() == 0) {
return Evaluation.EXCLUDE_AND_CONTINUE;
} else {
Node node = path.endNode();
if (!node.hasProperty("TBitcoin"))
return Evaluation.INCLUDE_AND_CONTINUE;
double coin = (double) node.getProperty("TBitcoin");
if (threshold!=Double.MIN_VALUE) {
if (coin<=threshold) {
return Evaluation.EXCLUDE_AND_PRUNE;
} else {
return Evaluation.INCLUDE_AND_CONTINUE;
}
} else {
return Evaluation.INCLUDE_AND_CONTINUE;
}
}
}
});
res = new ArrayList<Path>();
int i=0;
for(Path path : td.traverse(src)){
i++;
//System.out.println(path);
//res.add(path);
}
System.out.println();
tx.success();
} catch (Exception e) {
e.printStackTrace();
}
} else {
;
}
return res;
}
Can someone take a look at my function and give me some ideas why it is so slow and will cause out-of-memory error? I set Xmx=15000m
while runing this program.
Upvotes: 0
Views: 119
Reputation: 3749
If I have understood what you are trying to achieve and because you have a direction on your relationship I think that you can get away with something quite simple:
MATCH (src:keys{publickey:'A')-[r:SEND|RECEIVE*4]->(dest:keys{publickey:'C'})
RETURN COUNT(r)
Depending on your data set @FrobberOfBits makes a good point regarding testing equality of intermediaries which you cannot do using this approach, however with just the two transactions you are testing for cases where a Transaction source and destination are the same (r1 <> src
and r1 <> dest
), which may not even be valid in your model. If you were testing 3 or more transactions then things would get more interesting as you might want to exclude paths like (A)-->(T1)-->(B)-->(T2)-->(A)-->(T3)-->(C)
Shameless theft:
MATCH path=(src:keys{publickey:'A')-[r:SEND|RECEIVE*6]->(dest:keys{publickey:'C'})
WHERE ALL (n IN NODES(path)
WHERE (1=LENGTH(FILTER(m IN NODES(path)
WHERE m=n))))
RETURN COUNT(path)
Or traversal (caveat, pseudo code, never used it):
PathExpander expander = PathExapnder.forTypesAndDirections("SEND", OUTGOING, "RECEIVE", OUTGOING)
PathFinder<Path> finder = GraphAlgoFactory.allSimplePaths(expander, 6);
Iterable<Path> paths = finder.findAllPaths(src, dest);
Upvotes: 0
Reputation: 18022
My $0.02 is that you shouldn't do this with java, you should do it with Cypher. But your query needs some work. Here's your basic query:
start src=node:keys(PublicKey="A"),dest=node:keys(PublicKey="C")
match p=src-->(t1)-->(r1)-->(t2)-->dest
return count(p);
There are at least two problems with this:
Here's how to tighten your query so it should perform much better:
start src=node:keys(PublicKey="A"),dest=node:keys(PublicKey="C")
match p=src-[:send]->(t1:transaction)-[:receive]->(r1)-[:send]->(t2:transaction)-[:receive]->dest
where r1 <> src and
r1 <> dest
return count(p);
This should prune out a lot of possible edge and node traversals that you're currently doing, that you don't need to be doing.
Upvotes: 0