Filtering edges based on information derived from properties of different edges

Question

I'm trying to represent a train schedule timetable as a graph using Tinkerpop3. The nodes of the graph are the train stations and the edges are "schedule elements" that contain all the information about a train going from one station to another. I'm currently struggling to formulate a graph query to find all edges that correspond to trains leaving a station within a time window derived from properties of a different edge.

To illustrate the problem I'm trying to solve, I have set up a small toy graph:

graph = TinkerGraph.open()

g = graph.traversal()

g.addV().property('name', 'London Euston').
  addV().property('name', 'Milton Keynes').
  addV().property('name', 'Stoke-on-Trent').
  addV().property('name', 'Stockport').
  addV().property('name', 'Manchester Piccadilly')

  schedule = [['London Euston', 'Milton Keynes', 
               'T1', 1509537600000, 1509539400000],
              ['Milton Keynes', 'Stoke-on-Trent', 
               'T1', 1509539460000, 1509541200000],
              ['Stoke-on-Trent', 'Stockport', 
               'T1', 1509541260000, 1509543000000],
              ['Stockport', 'Manchester Piccadilly', 
               'T1', 1509543060000, 1509544800000],
              ['London Euston', 'Milton Keynes', 
               'T2', 1509537900000, 1509540000000],
              ['Milton Keynes', 'Stoke-on-Trent', 
               'T2', 1509540060000, 1509542100000],
              ['Stoke-on-Trent', 'Stockport', 
               'T2', 1509542160000, 1509544200000],
              ['Stockport', 'Manchester Piccadilly', 
               'T2', 1509544260000, 1509546600000],
              ['London Euston', 'Milton Keynes', 
               'T3', 1509548400000, 1509550200000],
              ['Milton Keynes', 'Stoke-on-Trent', 
               'T3', 1509550260000, 1509552000000],
              ['Stoke-on-Trent', 'Stockport', 
               'T3', 1509552060000, 1509553800000],
              ['Stockport', 'Manchester Piccadilly', 
               'T3', 1509553860000, 1509555600000]]

for(scheduleInfo in schedule)
{
  v1 = g.V().has('name', scheduleInfo[0]).next()
  v2 = g.V().has('name', scheduleInfo[1]).next()

  v1.addEdge('Schedule', v2, 
             'trainID', scheduleInfo[2], 
             'outTime', scheduleInfo[3], 
             'inTime', scheduleInfo[4])
}

The graph consists of a (made up) schedule of three trains going from London to Manchester. The edges have three properties:

trainID: a train ID
outTime: departure time from outgoing node
inTime: arrival time at incoming node

All times are stored as UNIX timestamps in milliseconds. One of the things that I want to find out from my graph is:

for all stations that train 'T1' leaves from, what other trains also leave those stations?

This turns out to be a fairly easy Gremlin query

g.E().has('trainID','T1').as('e1').
outV().as('station').outE().as('e2').
path().by('trainID').by('name')

The query that I'm struggling with right now, and that inspired this post, is the following:

for all stations that train 'T1' leaves from, how many trains leave the same stations within plus/minus 15 minutes of train 'T1'?

The closest answer that I have been able to come up with so far is this:

g.E().has('trainID','T1').as('e1').
outV().as('station').
outE().as('e2').
path().
filter{it.get().e2.outTime > (it.get().e1.outTime - 60*15*1000L)}.
filter{it.get().e2.outTime < (it.get().e1.outTime + 60*15*1000L)}

This query gets me all the paths that I want, but it uses lambda steps, something the TinkerPop3 documentation explicitly discourages. I'm wondering if there is a different way of writing this query that doesn't use lambda steps. Any help would be greatly appreciated.

Daniel Kuppitz · Accepted Answer

The basic query is:

g.E().has('trainID','T1').
  sack(assign).
    by('outTime').
  outV().as('station').
  outE().has('trainID', neq('T1')).
  sack(minus).
    by('outTime').
  filter(sack().is(between(-60*15*1000L, 60*15*1000L)))

Adding some path labels to create a more readable output:

gremlin> g.E().has('trainID','T1').as('e1').
           sack(assign).
             by('outTime').
           outV().as('station').
           outE().has('trainID', neq('T1')).as('e2').
           sack(minus).
             by('outTime').
           filter(sack().is(between(-60*15*1000L, 60*15*1000L))).
           select('station','e1','e2').each {
             println "${it.get('station').value('name')}"
             println "* T1 departure: ${new Date(it.get('e1').value('outTime'))}"
             println "* ${it.get('e2').value('trainID')} departure: ${new Date(it.get('e2').value('outTime'))}
"
           }; []
London Euston
* T1 departure: Wed Nov 01 05:00:00 MST 2017
* T2 departure: Wed Nov 01 05:05:00 MST 2017

Milton Keynes
* T1 departure: Wed Nov 01 05:31:00 MST 2017
* T2 departure: Wed Nov 01 05:41:00 MST 2017

Stoke-on-Trent
* T1 departure: Wed Nov 01 06:01:00 MST 2017
* T2 departure: Wed Nov 01 06:16:00 MST 2017

Filtering edges based on information derived from properties of different edges

Answers (1)

Related Questions