Reputation: 55
I created a Needham time tree (http://www.markhneedham.com/blog/2014/04/19/neo4j-cypher-creating-a-time-tree-down-to-the-day/) going down to the hour. On each hour node I stored the epoch time
CREATE (h:Hour {hour: hour, day:day, month: month, year:year, time: apoc.date.parse(year +"-"+month+"-"+day+" "+hour+":00", "s", "yyyy-MM-dd hh:mm")})
Now I want to link events that happened within that hour to the hour node. I did this with the following query:
//Create one example event node
create (e:Event {time: apoc.date.parse("2017-11-17 13:15", "s", "yyyy-mm-dd HH:mm")})
with e
match (h:Hour) where h.time <= e.time and (h.time+3600) > e.time
merge (e)-[:IN_HOUR]->(h)
I indexed both the event time and hour time.
This works well for small groups of events but when I scale the events up to the hundreds of thousands range it goes VERY slowly. (On the order of a few hundred to 1000 relationships per hour)
How can I do this faster?
I tried both
match (e:Event) ...`
and using load CSV
to iterate over each event, match it to an existing event node and then create a relationship to the time tree.
`
Upvotes: 0
Views: 74
Reputation: 66967
[EDITED]
First, create an index on :Hour(time)
:
CREATE INDEX ON :Hour(time);
Then, change your query to the following (assuming that you pass the event time as a event_time
parameter:
CREATE (e:Event {time: apoc.date.parse($event_time, "s", "yyyy-MM-dd HH:mm")})
WITH e
MATCH (h:Hour {time: e.time/3600*3600})
USING INDEX h:Hour(timer)
MERGE (e)-[:IN_HOUR]->(h);
The expression e.time/3600*3600
rounds e.time
down to the nearest hour (but is still in seconds units). Since :Hour(time)
is indexed, the MATCH
should be be fast.
NOTE: The USING INDEX
clause is there to give the Cypher Planner a hint that it should take advantage of the index. The planner does not aleways do that. If you the PROFILE your queries, you can see if giving a hint is necessary.
You query cannot make use of the index, since its WHERE
clause is too complex.
Upvotes: 1
Reputation: 55
Here is the solution I landed on:
thanks to cybersam I realized my where clause was too complicated for the indexing to work. Unfortunately cybersam's approach was just as slow. I removed the calculation from the match
by updating the (:Event) nodes by adding an timeHour
property (and indexing it)
match (e:Event) set e.timeHour = e.time/3600*3600
Then I was able to join a few hundred thousand relationship in 3 seconds by doing:
match (e:Event)
match (h:Hour) where h.time = e.timeHour
merge (e)-[:IN_HOUR]->(h)
Upvotes: 0