Reputation: 1720
I am trying to do CDR (Call Details Record) Analysis on mobile calls data. Calls are made by a PERSON
, THROUGH
a tower and CONNECTS
to a number. I want to isolate calls that were made prior to a certain date and time and the calling number does not exist after that particular date and time in the records. My current query only shows me data prior to the particular occurrence I am looking for:
MATCH (a:PERSON)-[t:THROUGH]->()-[:CONNECTS]->(b)
WHERE toInteger(t.time)<1500399900
RETURN a,b
However, how do I now isolate only those records which exist before t.time=1500399900
and not after that? Also, if I do not limit the above query to say 1000, my browser (Google Chrome), crashes. Any solution for that please?
After running the query as suggested this is what EXPLAIN
looks like:
If it helps, this is how I loaded the csv file in neo4j:
//Setup initial constraints
CREATE CONSTRAINT ON (a:PERSON) assert a.number is unique;
CREATE CONSTRAINT ON (b:TOWER) assert b.id is unique;
//Create the appropriate nodes
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///combined.csv" AS line
MERGE (a:PERSON {number: line.Calling})
MERGE (b:PERSON {number: line.Called})
MERGE (c:TOWER {id: line.CellID1})
//Setup proper indexing
DROP CONSTRAINT ON (a:PERSON) ASSERT a.number IS UNIQUE;
DROP CONSTRAINT ON (a:TOWER) ASSERT a.id IS UNIQUE;
CREATE INDEX ON :PERSON(number);
CREATE INDEX ON :TOWER(id);
//Create relationships between people and calls
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///combined.csv" AS line
MATCH (a:PERSON {number: line.Calling}),(b:PERSON {number: line.Called}),(c:TOWER {id: line.CellID1})
CREATE (a)-[t:THROUGH]->(c)-[x:CONNECTS]->(b)
SET x.calltype = line.CallType, x.provider = line.Provider, t.time=toInteger(line.ts), t.duration=toInteger(line.Duration)
Upvotes: 0
Views: 107
Reputation: 5047
However, how do I now isolate only those records which exist before
t.time=1500399900
and not after that?
Let's create a small example data set:
CREATE
(a1:PERSON {name: 'a1'}), (a2:PERSON {name: 'a2'}),
(b1:PERSON {name: 'b1'}), (b2:PERSON {name: 'b2'}),
(b3:PERSON {name: 'b3'}), (b4:PERSON {name: 'b4'}),
(a1)-[:THROUGH {time: 1}]->(:TOWER)-[:CONNECTS]->(b1),
(a1)-[:THROUGH {time: 3}]->(:TOWER)-[:CONNECTS]->(b2),
(a2)-[:THROUGH {time: 2}]->(:TOWER)-[:CONNECTS]->(b3),
(a2)-[:THROUGH {time: 15}]->(:TOWER)-[:CONNECTS]->(b4)
It looks like this when visualized:
This query might do the trick for you:
MATCH (a:PERSON)-[t1:THROUGH]->(:TOWER)-[:CONNECTS]->(b:PERSON)
WHERE toInteger(t1.time) < 5
OPTIONAL MATCH (a)-[t2:THROUGH]->(:TOWER)
WHERE t2.time >= 5
WITH a, b, t1, t2
WHERE t2 IS NULL
RETURN a, b, t1
After the first match, it looks for calls of PERSON
a
that were initiated after timestamp 5
. There might be no such calls, hence we it uses OPTIONAL MATCH
. The value of t2
will be null if there were no calls after the specified timestamp, so we do an IS NULL
check and return the filtered results.
Also, if I do not limit the above query to say 1000, my browser (Google Chrome), crashes. Any solution for that please?
If you use the graph visualizer, it usually cannot render more than a few hundred nodes. Possible workarounds:
SKIP ... LIMIT ...
.Upvotes: 1