Gilles jr Bisson
Gilles jr Bisson

Reputation: 533

Gremlin API: how to traverse vertices, collect properties conditionally, and stop when collected properties reach a certain limit

I'm new to graphs and Gremlin API although I find it promising for my purpose so I am investigating and despite looking at documentation and several tutorials, I can't seem to find how to do this.

A portion of my graph would have vertices representing "documents" with "id" (1, 2, 3, ...) and "author" (a: Luc, Kim, ...) properties. They are linked with simple "next" edges. Like this:

V1{a:'Luc'} -> V2{a:'Kim'} -> V3{a:'Marc'} -> V4{a:'Kim'} -> V5{a:'Luc'} -> V6{a:'Luc'}

What I am trying to do:

So for this example, I would expect the result to be: [2, 4, 5].

From what I found so far, I would have something like:

g.V(2).repeat(somethingThatKeepsTheIdIfAuthorIsInArray(['Luc', 'Kim']).out()).until("I have found 3 ids")

Or if I word it differently, I want to "skip" vertex 3 because its author is not in the list, but keep going until I find at most 3 vertices that match my condition or I reach then end.

Any idea of what step(s) I should be looking at to accomplish this ?

Upvotes: 1

Views: 1110

Answers (2)

Kelvin Lawrence
Kelvin Lawrence

Reputation: 14371

Using this graph

g.addV('Document').property('author','Luc').property(id,'D1').as('d1').
  addV('Document').property('author','Kim').property(id,'D2').as('d2').
  addV('Document').property('author','Marc').property(id,'D3').as('d3').
  addV('Document').property('author','Kim').property(id,'D4').as('d4').
  addV('Document').property('author','Luc').property(id,'D5').as('d5').
  addV('Document').property('author','Luc').property(id,'D6').as('d6').
  addE('NEXT').from('d1').to('d2').
  addE('NEXT').from('d2').to('d3').
  addE('NEXT').from('d3').to('d4').
  addE('NEXT').from('d4').to('d5').
  addE('NEXT').from('d5').to('d6') 

We can write a Gremlin query that tests for certain author's names and uses a sideEffect to store the ones that match.


g.V('D2').
  repeat(out('NEXT').sideEffect(has('author',within('Luc','Kim')).id().store('documents'))).
  until(not(out('NEXT'))).
  select('documents')

Which will return

['D4', 'D5', 'D6']

With a small change to the query, we can also test for a maximum number of documents.

g.V('D2').
  repeat(out('NEXT').
         sideEffect(has('author',within('Luc','Kim')).id().store('documents'))).
  until(not(out('NEXT')).or().select('documents').count(local).is(2)).
  select('documents')

Which yields

['D4', 'D5']

To include the starting vertex in the results, we can just put the sideEffect first in the repeat step.

g.V('D2').
  repeat(out('NEXT').
         sideEffect(has('author',within('Luc','Kim')).id().store('documents'))).
  until(not(out('NEXT')).or().select('documents').count(local).is(2)).
  select('documents')

Which yields

['D2', 'D4', 'D5']

UPDATED 2021-11-17

In response to the comments/discussion below, here are two more examples of the query in use/ The fist starts at D2 with an is(3) and the second starts at D4. These were run using the Gremlin Console and TinkerGraph. While my query works, now that I fully understand your requirements, I think it is a bit over-engineered for your case. The answer you have since posted I think is fine for this case. The one caveat being, as we discussed in comments, that your query as written will fail on most TinkerPop enabled stores but seems to be working the way you need on CosmosDB. I up-voted your answer :-)

gremlin> g.V('D2').
......1>   repeat(out('NEXT').
......2>          sideEffect(has('author',within('Luc','Kim')).id().store('documents'))).
......3>   until(__.not(out('NEXT')).or().select('documents').count(local).is(3)).
......4>   select('documents')   

==>[D4,D5,D6]   



gremlin> g.V('D4').
......1>   repeat(out('NEXT').
......2>          sideEffect(has('author',within('Luc','Kim')).id().store('documents'))).
......3>   until(__.not(out('NEXT')).or().select('documents').count(local).is(3)).
......4>   select('documents')     

==>[D5,D6]  

Upvotes: 0

Gilles jr Bisson
Gilles jr Bisson

Reputation: 533

I kept digging and found this to be a possible solution :

g.v('D2').emit(has('author', within('Luc', 'Kim'))).repeat(out()).limit(3).values('id').fold()

Which yields:

['D2', 'D4', 'D5']

I also confirmed that it works correctly no matter from which vertex we start. For example starting a D5 yields:

['D5', 'D6']

The only concern (question) I have about this is how the limit(3) is applied ? Does it actually break the repeat loop when 3 vertices are emitted or will the repeat loop traverse all vertices until there are no more "next" edge, THEN truncate the result to only return 3 ?

Upvotes: 1

Related Questions