Eager
Eager

Reputation: 1691

Neo4j: update Cypher query to return every hundredth element

Application Background: I am working on application which consists of client and server side. Server side consumes data every 5 minutes from some external API feeder, transforms it and save it into Neo4j graph database. Client side fetches all stored data by making a call to the server side and builds chart based on received data:

http://decisionwanted.com/decisions/2/bitcoin

More saving details: every time, for newly consumed data I create new history value nodes with new relationships to the existing Value (root) node:

Data model

Issue: Server side returns all stored so far data by applying following Cypher query:

MATCH (v:Value)-[rvhv:CONTAINS]->(hv:HistoryValue)
WHERE v.id = {valueId}
OPTIONAL MATCH (hv)-[ru:CREATED_BY]->(u:User)
WHERE {fetchCreateUsers}
RETURN ru, u, rvhv, v, hv
ORDER BY hv.createDate DESC

Since total data volume is increasing after each consume operation, query performance starts reducing and latency starts increasing.

Questions:

  1. At some moment, my graph will consist of more then ~100k value history nodes, which significantly decreases query performance. Thus, can anyone suggest better approach to store or retrieve such kind of data?
  2. Instead of sending all stored data at once, I want to be able to limit size of returned data based on time range received from client and step which determines number of elements to be skipped to the next value that should be returned.

For e.g.: There are 1000 history value nodes stored. And I want to return only every hundredth element, starting from 1st and ending 1000.

So the result set of the query should contains nodes 1, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000.

The approach looks good for me. The only problem is how can I tell Cypher query:

MATCH (v:Value)-[rvhv:CONTAINS]->(hv:HistoryValue)
WHERE v.id = {valueId}
OPTIONAL MATCH (hv)-[ru:CREATED_BY]->(u:User)
WHERE {fetchCreateUsers}
RETURN ru, u, rvhv, v, hv
ORDER BY hv.createDate DESC

to return only every hundredth element? Does anyone know how to do it?

Upvotes: 1

Views: 263

Answers (1)

cybersam
cybersam

Reputation: 66967

  1. Because your query is using ORDER BY at the end, Cypher has to generate all result rows and then sort them. If, as stated in question #2, you want to limit the results to a time range, you should filter for that as early as possible to minimize the amount of work. For example, if you are only interested in createDate values within parameterized startDate and endDate values:

    MATCH (v:Value)-[rvhv:CONTAINS]->(hv:HistoryValue)
    WHERE v.id = {valueId} AND {startDate} <= hv.createDate <= {endDate}
    OPTIONAL MATCH (hv)-[ru:CREATED_BY]->(u:User)
    WHERE {fetchCreateUsers}
    RETURN ru, u, rvhv, v, hv
    ORDER BY hv.createDate DESC
    
  2. In addition to performing the above early-filtering, the following query returns a collection of the rows at indexes 0, 100, 200, ..., 1000:

    MATCH (v:Value)-[rvhv:CONTAINS]->(hv:HistoryValue)
    WHERE v.id = {valueId} AND {startDate} <= hv.createDate <= {endDate}
    OPTIONAL MATCH (hv)-[ru:CREATED_BY]->(u:User)
    WHERE {fetchCreateUsers}
    WITH ru, u, rvhv, v, hv
    ORDER BY hv.createDate DESC
    LIMIT 1001
    WITH COLLECT({ru: ru, u: u, rvhv: rvhv, v: v, hv: hv}) AS data
    RETURN REDUCE(s = [], i IN RANGE(0, 1000, 100) | s + data[i]) AS result;
    
    • The LIMIT 1001 clause minimizes the size of the data collection to just 1001 rows of data (because index 1000 is for row 1001).
    • RANGE(0, 1000, 100) is used to generate the indexes of the rows of interest.
    • The REDUCE function is used to generate the resulting collection of data at those indexes.

Upvotes: 3

Related Questions