Lewis Bushman
Lewis Bushman

Reputation: 285

Neo4j / Cypher - pagination with multiple queries

I need to paginate (limit + offset) results from multiple performance intensive queries.

At first, I simulated pagination using a python's generator

first = 100
# Offset
skip = 50

cursor = 0

# Generator is returned by neo4j so we don't have a significant performance impact
q1_results = tx.run(.....)
q2_results = tx.run(.....)

for result in q1_results:
   if cursor < first:
       yield result
       cursor += 1

for result in q2_results:
   if cursor < first:
       yield result
       cursor += 1

However, the problem here is enforcing the offset: in order to achieve it programmatically I'll have to iterate again over the first results and do it that way:

first = 100
# Offset
skip = 50

cursor = 0
skip_cursor = 0

# Generator is returned by neo4j so we don't have a significant performance impact
q1_results = tx.run(.....)
q2_results = tx.run(.....)

for result in q1_results:
    if cursor < first & skip_cursor > skip:
        yield result
        cursor += 1
    else:
        skip_cursor += 1

for result in q2_results:
    if cursor < first & skip_cursor > skip:
        yield result
        cursor += 1
    else:
        skip_cursor += 1      

Then I tried combining the query into one big query, but it required using aggregating functions (like collect and distinct) so it had an enormous performance impact and the queries became really slow.

I'm wondering if I'm missing something and if there is a proper way to achieve pagination in that scenario.

Upvotes: 1

Views: 539

Answers (1)

Nigel Small
Nigel Small

Reputation: 4495

At the moment, the proper way to do this is to use SKIP and LIMIT in your Cypher query. The underlying protocol has no mechanism to return only a portion of your query result so even with your code, you will still generate, send and buffer the entire result set.

We have an item on our roadmap to introduce full flow control, along with a reactive API. This will enable full stack support for incremental delivery of records, with options to skip and cancel the stream. But this is complex change so won't arrive until the end of this year at the earliest. Until then, your best bet is to use Cypher keywords.

Upvotes: 2

Related Questions