Liang  Chen
Liang Chen

Reputation: 21

How to speed up Two Hop query in TitanDB with Cassandra

I am testing TitanDB + Cassandra now. Graph Schema like this:

VERTEX: USER(userId), IP(ip), SESSION_ID(sessionId), DEVICE(deviceId) EDGE: USER->IP, USER->SESSION_ID, USER->DEVICE DATA SIZE: Vertex 100Million, Edge: 1 billion Index: Vertex-Centric index on all kinds of edge . Index for userId, ip, sessionId, and deviceId.

Set Vertext partition for IP, DEVICE and SESSION_ID. Total 32 partition.

Cassandra hosts:AWS EC2 I2 (2xlage) x 24 . Currently, every host hold about 30G data.

Usecase: give a userId with a edgeLabel, find out all related users by this edge's out vertex. for example: g.V().has(T.label, 'USER').has('USER_ID', '12345').out('USER_IP').in().valueMap();

But this kinds of query is pretty slow, sometimes, hundreds seconds. One user can have many related IP (hundreds), so from these IPs, it also can get lots of USERs (thousands).

Does Titan parallel query for this kind of query against all partition of backend storage?? I try to use limit:

g.V().has(T.label, 'USER').has('USER_ID', '12345').out('USER_IP').limit(50).in().limit(100).valueMap()

It's also slow. I hope this kinds of query can be done in 5seconds. How the Titan limit() works? Get all result first, then 'limit' ??

How to increase the performance for it? Can anyone give some advice?

Upvotes: 1

Views: 205

Answers (1)

Filipe Teixeira
Filipe Teixeira

Reputation: 3565

One quick perfomance gain you could get is from using Titan's Vertex Centric Indices this allows you to make very quick leaps from one vertex to another. For example you could try something like this:

mgmt = graph.openManagement()
userId = mgmt.getPropertyKey('userId')
userIp = mgmt.getEdgeLabel('USER_IP')
mgmt.buildEdgeIndex(userIp, 'userIdByUserIP', Direction.BOTH, Order.decr, time)
mgmt.commit()

To create a simple vertex centric index.

If you want to lookup multiple user ips from multiple user vertices then you could try using Titan-Hadoop. However, that is a more involved process.

Upvotes: 1

Related Questions