Reputation: 311

How to improve performance of count query

The objective of the query is to find the count of nodes and edges returned. The query is as follows:

g.inject(1).union(V().has('property1', 'A').aggregate('v').outE().has('property1', 'E').aggregate('e').inV().has('property1', 'B').aggregate('v')).select('v').dedup().as('vertexCount').select('e').dedup().as('edgeCount').select('vertexCount','edgeCount').by(unfold().count())

Output: vertexCount: 200k edgeCount: 250k Time took: 1.5 mins

I was trying to optimize the query and tried the following:

g.inject(1).union(V().has('property1', 'A').as('v1').outE().has('property1', 'E').as('e').inV().has('property1', 'B').as('v2')).select('v1','e','v2').by(valueMap().by(unfold())).count()

Output: 250k Time Took: 30 sec It's returning the edge count only.

How can we optimize the query to return both vertex and edge count and also limit on vertex or edge if required??

Upvotes: 2

Answers (1)

stephen mallette

Reputation: 46226

I'm not sure I have anything ground breaking to offer but it seems like your second query could get faster just by removing processing unneeded for counting:

g.V().has('property1', 'A').
  outE().has('property1', 'E').
  inV().has('property1', 'B').
  count()

I would imagine that if "property1" (for "A") was indexed the removal of inject()/union() would allow that index to get a hit (not sure JanusGraph will optimize that query as it is with the inject()/union() and neither seem to serve a purpose). Depending on the nature of "property1" for "E" a vertex centric index there might also be helpful. The select().by() seems like an unnecessary and potentially costly transform because it enables path tracking and forces an added Map transform which you just throw away in the count()

Your comment indicates that you need the count of the source vertex as well as the edge. Perhaps something like this would work:

gremlin> g.V(1).aggregate('e').by(constant(1)).
......1>   outE().
......2>   inV().count().
......3>   math("(2 * _) + x").
......4>     by().
......5>     by(select('e').unfold().sum()) 
==>7.0

The aggregate() just holds a "1" for each source vertex in a list which you sum() later in the math() step. Since the number of edges should equate to the number of inV() you can just multiply it by "2" and then add that sum to get the count of what you are looking for.

Or if edges can point to the same destination vertex, just extend the aggregate pattern to the edges and dedup() the inV():

gremlin> g.V(1).aggregate('s').by(constant(1)).
......1>   outE().aggregate('e').by(constant(1)).
......2>   inV().dedup().count().
......3>   math("_ + source + edge").
......4>     by().
......5>     by(select('s').unfold().sum()).
......6>     by(select('e').unfold().sum())  
==>7.0

You could also add filtering if you don't want to count any source vertices that don't match a full path to the destination:

gremlin> g.V(1).filter(outE().has('weight',gt(0)).inV().hasLabel('person','software')).
......1>   aggregate('s').by(constant(1)).
......2>   outE().has('weight',gt(0)).
......3>   aggregate('e').by(constant(1)).
......4>   inV().hasLabel('person','software').dedup().count().
......5>   math("_ + source + edge").
......6>     by().
......7>     by(select('s').unfold().sum()).
......8>     by(select('e').unfold().sum()) 
==>7.0

Upvotes: 3

How to improve performance of count query

Answers (1)

Related Questions