chris
chris

Reputation: 581

Performing aggregrate queries using Gremlin / TitanDB

I have a Titan graph database witha set of vertices connected by an edge with a property named "property1".

Is it possible to write a Gremlin (or anything else Titan would support) query to:

Find all edges that have a value for "property1" that is seen 5 or less times.

In SQL I would use "Group By", in MongoDB I would use one of the aggregate functions.

I am thinking this may be a job for Furnace/Faunus?

Upvotes: 1

Views: 1385

Answers (1)

stephen mallette
stephen mallette

Reputation: 46206

You can do this by iterating all edges and using groupBy. Here's an example with the toy graph using weight in place of property1:

gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> g.E.groupBy{it.weight}{it}.cap.next()                         
==>0.5=[e[7][1-knows->2]]
==>1.0=[e[8][1-knows->4], e[10][4-created->5]]
==>0.4=[e[11][4-created->3], e[9][1-created->3]]
==>0.2=[e[12][6-created->3]]

So that groups all edges by their weight. From there you can drop down to standard groovy functions like findAll to filter out what you don't want (here i filter out weights that have >1 edge in them...in your case it would be <5).

gremlin> g.E.groupBy{it.weight}{it}.cap.next().findAll{k,v->v.size()>1}
==>1.0=[e[8][1-knows->4], e[10][4-created->5]]
==>0.4=[e[11][4-created->3], e[9][1-created->3]]

Obviously this is a bit of an expensive operation on a really large graph as you have a lot of iteration to do over edges and you have to build up a Map in memory which could be big depending on the diversity of the values in property1. If you can find ways to limit edge iteration with other filters, that might be helpful.

This would be a good job for Faunus if you had a really large graph. I'll go with the easy answer here and simply say that you don't necessarily want the specific edges with a property1 value occurring less than 5 times and that you just want to know how many times different property1 values occur. With Faunus you could get a distribution like that with:

g.E.property1.groupCount()

Upvotes: 1

Related Questions