user3244615
user3244615

Reputation: 430

Gremlin query search by search key across multiple vertex properties

I am trying to write a search query where in the input is some search key and the requirement is to search among the vertices where the given input key matches the value of two or more property keys of the vertex. For example assuming that I have user vertices in my graph db with the following property keys:

Now given a search key 'xyz' I have to search across the user vertices where any of the above three property keys matches the value 'xyz'. This is how I have approached the problem.

g.V.has('ENTITY_TYPE', 'USER').or(_().has('USER_EMAIL' , TEXT.REGEX , '.*xyz.*') , _().has('USER_FNAME' , TEXT.REGEX , '.*xyz.*''USER_EMAIL' , TEXT.REGEX , '.*xyz.*') , _().has('USER_LNAME' , TEXT.REGEX , '.*xyz.*')).dedup();

I have created the required mixed indices (three separate mixed indices) for USER_EMAIL, USER_FNAME and USER_LNAME as follows:

key = m.makePropertyKey("USER_EMAIL").dataType(String.class).make();
m.buildIndex("serachbyemail",Vertex.class).addKey(key).buildMixedIndex("search");

key = m.makePropertyKey("USER_FNAME").dataType(String.class).make();
m.buildIndex("searchbyfname",Vertex.class).addKey(key).buildMixedIndex("search");

key = m.makePropertyKey("USER_LNAME").dataType(String.class).make();
m.buildIndex("typemixed",Vertex.class).addKey(key).buildMixedIndex("search");

This works fine. But I want to know if this is the best approach to this kind of problem? Or is there a better way to do this? Also I am using gremlin java api to write the above query. I am using dedup() to remove the duplicate vertices.

Upvotes: 2

Views: 2488

Answers (1)

Daniel Kuppitz
Daniel Kuppitz

Reputation: 10904

The 3 indices won't help to answer your query efficiently. Better create a single index that covers all of the 3 fields (that doesn't mean, that your query has to have a condition for all fields) and issue a direct index query:

Sample graph:

g = TitanFactory.open("conf/titan-cassandra-es.properties")
m = g.getManagementSystem()

user = m.makeVertexLabel("USER").make()
email = m.makePropertyKey("USER_EMAIL").dataType(String.class).make()
fname = m.makePropertyKey("USER_FNAME").dataType(String.class).make()
lname = m.makePropertyKey("USER_LNAME").dataType(String.class).make()

m.buildIndex("users", Vertex.class).addKey(email).addKey(fname).addKey(lname).indexOnly(user).buildMixedIndex("search")
m.commit()

ElementHelper.setProperties(g.addVertexWithLabel("USER"), "USER_EMAIL", "[email protected]", "USER_FNAME", "foo", "USER_LNAME", "bar")
ElementHelper.setProperties(g.addVertexWithLabel("USER"), "USER_EMAIL", "[email protected]", "USER_FNAME", "foo", "USER_LNAME", "bar")
ElementHelper.setProperties(g.addVertexWithLabel("USER"), "USER_EMAIL", "[email protected]", "USER_FNAME", "foo", "USER_LNAME", "xyz")
ElementHelper.setProperties(g.addVertexWithLabel("USER"), "USER_EMAIL", "[email protected]", "USER_FNAME", "xyz", "USER_LNAME", "bar")
ElementHelper.setProperties(g.addVertexWithLabel("USER"), "USER_EMAIL", "[email protected]", "USER_FNAME", "xyz", "USER_LNAME", "xyz")

g.commit()

Direct index query:

gremlin> g.indexQuery("users", 'v."USER_EMAIL":/.*xyz.*/ v."USER_FNAME":/.*xyz.*/ v."USER_LNAME":/.*xyz.*/').vertices()*.getElement()._().map()
==>{USER_FNAME=xyz, USER_LNAME=xyz, [email protected]}
==>{USER_FNAME=xyz, USER_LNAME=bar, [email protected]}
==>{USER_FNAME=foo, USER_LNAME=xyz, [email protected]}
==>{USER_FNAME=foo, USER_LNAME=bar, [email protected]}

As you can see I also replaced ENTITY_TYPE with a vertex label. The label can help to keep your index as small as possible. If, for example, another type of vertices (e.g. PROFILE) also uses the property USER_EMAIL, it wouldn't make it into the index (if it was created using .indexOnly(user)).

Upvotes: 3

Related Questions