Reputation: 468
I have a Janusgraph database with the following schema:
(Journal)<-[PublishedIn]-(Paper)<-[AuthorOf]-(Author)
I'm trying to write a query using the gremlin match()
clause that will search for two different journals and the related papers with a keyword in the title and the authors. Here's what I have so far:
sg = g.V().match(
__.as('a').has('Journal', 'displayName', textContains('Journal Name 1')),
__.as('a').has('Journal', 'displayName', textContains('Journal Name 2')),
__.as('a').inE('PublishedIn').subgraph('sg').outV().as('b'),
__.as('b').has('Paper', 'paperTitle', textContains('My Key word')),
__.as('b').inE('AuthorOf').subgraph('sg').outV().as('c')).
cap('sg').next()
This query runs successfully but returns 0 vertices and 0 edges. If I divide the query into two and search for each Journal displayName separately I get complete graphs, so I assume there's something wrong with the logic/syntax of my query.
If I write the query this way:
sg = g.V().or(has('JournalFixed', 'displayName', textContains('Journal Name 1')),
has('JournalFixed', 'displayName', textContains('Journal Name 2'))).
inE('PublishedInFixed').subgraph('sg').
outV().has('Paper', 'paperTitle', textContains('My Key word')).
inE('AuthorOf').subgraph('sg').
outV().
cap('sg').
next()
It returns a network with around 7000 nodes. How can I re-write this query to use the match()
clause?
Upvotes: 0
Views: 608
Reputation: 46206
I'm not sure if this is all of your problem but I think your match()
is modelling your "displayName" steps to be and()
rather than or()
. You can check with profile()
as I did here with TinkerGraph:
gremlin> g.V().match(__.as('a').has('name','marko'), __.as('a').has('name','josh')).profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
TinkerGraphStep(vertex,[name.eq(marko), name.eq... 0.067 100.00
>TOTAL - - 0.067 -
You could resolve this a number of ways I suppose. For my example use of within()
, as described in a different answer to an earlier question from you, works nicely:
gremlin> g.V().match(__.as('a').has('name', within('marko','josh'))).profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
TinkerGraphStep(vertex,[name.within([marko, jos... 2 2 0.098 100.00
>TOTAL - - 0.098 -
For your case, I would replace:
or(has('JournalFixed', 'displayName', textContains('Journal Name 1')),
has('JournalFixed', 'displayName', textContains('Journal Name 2')))
with:
has('JournalFixed', 'displayName', textContains('Journal Name 1').
or(textContains('Journal Name 2'))
essentially taking advantage of P.or()
. I think that either of these options should be better than using or()
-step up front, but only a profile()
of JanusGraph would tell as discussed here.
All that said, I'd wonder why your or()
could not be translated directly into the match()
as follows:
g.V().match(
__.as('a').or(has('Journal', 'displayName', textContains('Journal Name 1')),
has('Journal', 'displayName', textContains('Journal Name 2'))),
__.as('a').inE('PublishedIn').subgraph('sg').outV().as('b'),
__.as('b').has('Paper', 'paperTitle', textContains('My Key word')),
__.as('b').inE('AuthorOf').subgraph('sg').outV().as('c')).
cap('sg')
I'd imagine though that my suggestion of P.or()
is significantly more performant.
Upvotes: 1