Matt
Matt

Reputation: 468

Or statement with Match statement in Gremlin

I have a Janusgraph database with the following schema:

(Journal)<-[PublishedIn]-(Paper)<-[AuthorOf]-(Author)

I'm trying to write a query using the gremlin match() clause that will search for two different journals and the related papers with a keyword in the title and the authors. Here's what I have so far:

sg = g.V().match(
    __.as('a').has('Journal', 'displayName', textContains('Journal Name 1')),
    __.as('a').has('Journal', 'displayName', textContains('Journal Name 2')),
    __.as('a').inE('PublishedIn').subgraph('sg').outV().as('b'), 
    __.as('b').has('Paper', 'paperTitle', textContains('My Key word')),
    __.as('b').inE('AuthorOf').subgraph('sg').outV().as('c')).
 cap('sg').next()

This query runs successfully but returns 0 vertices and 0 edges. If I divide the query into two and search for each Journal displayName separately I get complete graphs, so I assume there's something wrong with the logic/syntax of my query.

If I write the query this way:

sg = g.V().or(has('JournalFixed', 'displayName', textContains('Journal Name 1')),
              has('JournalFixed', 'displayName', textContains('Journal Name 2'))).
              inE('PublishedInFixed').subgraph('sg').
              outV().has('Paper', 'paperTitle', textContains('My Key word')).
              inE('AuthorOf').subgraph('sg').
              outV().
              cap('sg').
              next()

It returns a network with around 7000 nodes. How can I re-write this query to use the match() clause?

Upvotes: 0

Views: 608

Answers (1)

stephen mallette
stephen mallette

Reputation: 46206

I'm not sure if this is all of your problem but I think your match() is modelling your "displayName" steps to be and() rather than or(). You can check with profile() as I did here with TinkerGraph:

gremlin> g.V().match(__.as('a').has('name','marko'), __.as('a').has('name','josh')).profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[name.eq(marko), name.eq...                                             0.067   100.00
                                            >TOTAL                     -           -           0.067        -

You could resolve this a number of ways I suppose. For my example use of within(), as described in a different answer to an earlier question from you, works nicely:

gremlin> g.V().match(__.as('a').has('name', within('marko','josh'))).profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TinkerGraphStep(vertex,[name.within([marko, jos...                     2           2           0.098   100.00
                                            >TOTAL                     -           -           0.098        -

For your case, I would replace:

or(has('JournalFixed', 'displayName', textContains('Journal Name 1')),
   has('JournalFixed', 'displayName', textContains('Journal Name 2')))

with:

has('JournalFixed', 'displayName', textContains('Journal Name 1').
                                   or(textContains('Journal Name 2'))

essentially taking advantage of P.or(). I think that either of these options should be better than using or()-step up front, but only a profile() of JanusGraph would tell as discussed here.

All that said, I'd wonder why your or() could not be translated directly into the match() as follows:

g.V().match(
    __.as('a').or(has('Journal', 'displayName', textContains('Journal Name 1')),
                  has('Journal', 'displayName', textContains('Journal Name 2'))),
    __.as('a').inE('PublishedIn').subgraph('sg').outV().as('b'), 
    __.as('b').has('Paper', 'paperTitle', textContains('My Key word')),
    __.as('b').inE('AuthorOf').subgraph('sg').outV().as('c')).
 cap('sg')

I'd imagine though that my suggestion of P.or() is significantly more performant.

Upvotes: 1

Related Questions