Dhiraj
Dhiraj

Reputation: 1720

Cypher query for isolating records on time basis

I am trying to do CDR (Call Details Record) Analysis on mobile calls data. Calls are made by a PERSON, THROUGH a tower and CONNECTS to a number. I want to isolate calls that were made prior to a certain date and time and the calling number does not exist after that particular date and time in the records. My current query only shows me data prior to the particular occurrence I am looking for:

MATCH (a:PERSON)-[t:THROUGH]->()-[:CONNECTS]->(b)
WHERE toInteger(t.time)<1500399900
RETURN a,b

However, how do I now isolate only those records which exist before t.time=1500399900 and not after that? Also, if I do not limit the above query to say 1000, my browser (Google Chrome), crashes. Any solution for that please?

After running the query as suggested this is what EXPLAIN looks like: enter image description here

If it helps, this is how I loaded the csv file in neo4j:

//Setup initial constraints
CREATE CONSTRAINT ON (a:PERSON) assert a.number is unique;
CREATE CONSTRAINT ON (b:TOWER) assert b.id is unique;

//Create the appropriate nodes
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///combined.csv" AS line
MERGE (a:PERSON {number: line.Calling})
MERGE (b:PERSON {number: line.Called})
MERGE (c:TOWER {id: line.CellID1})


//Setup proper indexing
DROP CONSTRAINT ON (a:PERSON) ASSERT a.number IS UNIQUE;
DROP CONSTRAINT ON (a:TOWER) ASSERT a.id IS UNIQUE;

CREATE INDEX ON :PERSON(number);
CREATE INDEX ON :TOWER(id);

//Create relationships between people and calls
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///combined.csv" AS line
MATCH (a:PERSON {number: line.Calling}),(b:PERSON {number: line.Called}),(c:TOWER {id: line.CellID1})
CREATE (a)-[t:THROUGH]->(c)-[x:CONNECTS]->(b)
SET  x.calltype = line.CallType, x.provider = line.Provider, t.time=toInteger(line.ts), t.duration=toInteger(line.Duration)

Upvotes: 0

Views: 107

Answers (1)

Gabor Szarnyas
Gabor Szarnyas

Reputation: 5047

However, how do I now isolate only those records which exist before t.time=1500399900 and not after that?

Let's create a small example data set:

CREATE 
  (a1:PERSON {name: 'a1'}), (a2:PERSON {name: 'a2'}),
  (b1:PERSON {name: 'b1'}), (b2:PERSON {name: 'b2'}),
  (b3:PERSON {name: 'b3'}), (b4:PERSON {name: 'b4'}),
  (a1)-[:THROUGH {time:  1}]->(:TOWER)-[:CONNECTS]->(b1),
  (a1)-[:THROUGH {time:  3}]->(:TOWER)-[:CONNECTS]->(b2),
  (a2)-[:THROUGH {time:  2}]->(:TOWER)-[:CONNECTS]->(b3),
  (a2)-[:THROUGH {time: 15}]->(:TOWER)-[:CONNECTS]->(b4)

It looks like this when visualized:

enter image description here

This query might do the trick for you:

MATCH (a:PERSON)-[t1:THROUGH]->(:TOWER)-[:CONNECTS]->(b:PERSON)
WHERE toInteger(t1.time) < 5
OPTIONAL MATCH (a)-[t2:THROUGH]->(:TOWER)
WHERE t2.time >= 5
WITH a, b, t1, t2
WHERE t2 IS NULL
RETURN a, b, t1

After the first match, it looks for calls of PERSON a that were initiated after timestamp 5. There might be no such calls, hence we it uses OPTIONAL MATCH. The value of t2 will be null if there were no calls after the specified timestamp, so we do an IS NULL check and return the filtered results.

Also, if I do not limit the above query to say 1000, my browser (Google Chrome), crashes. Any solution for that please?

If you use the graph visualizer, it usually cannot render more than a few hundred nodes. Possible workarounds:

  • Use the Text view of the web browser that scales better.
  • Paginate by using SKIP ... LIMIT ....

Upvotes: 1

Related Questions