Reputation: 535
I have a dataset with nodes that are companies linked by transactions.
A company has these properties : name, country, type, creation_date
The relationships "SELLS_TO" have these properties : item, date, amount
All dates are in the following format YYYYMMDD.
I'm trying to find a series of transactions that : - include 2 companies from 2 distinct countries - where between the first node in the series and the last one, there is a company that has been created less than 90 days ago - where the total time between the first transaction and the last transaction is < 15 days
I think I can handle the conditions 1) and 2) but I'm stuck on 3).
MATCH (a:Company)-[r:SELLS_TO]->(b:Company)-[v:SELLS_TO*]->(c:Company)
WHERE NOT(a.country = c.country) AND (b.creation_date + 90 < 20140801)
Basically I don't know how to get the date of the last transaction in the series. Anyone knows how to do that?
Upvotes: 0
Views: 364
Reputation: 2272
jvilledieu,
In answer to your most immediate question, you can access the collections of nodes and relationships in the matched path and get the information you need. The query would look something like this.
MATCH p=(a:Company)-[rs:SELLS_TO*]->(c:Company)
WHERE a.country <> c.country
WITH p, a, c, rs, nodes(p) AS ns
WITH p, a, c, rs, filter(n IN ns WHERE n.creation_date - 20140801 < 90) AS bs
WITH p, a, c, rs, head(bs) AS b
WHERE NOT b IS NULL
WITH p, a, b, c, head(rs) AS r1, last(rs) AS rn
WITH p, a, b, c, r1, rn, rn.date - r1.date AS d
WHERE d < 15
RETURN a, b, c, d, r1, rn
This query finds a chain with at least one :SELLS_TO relationship between :Company nodes and assigns the matched path to 'p'. The match is then limited to cases where the first and last company have different countries. At this point the WITH clauses develop the other elements that you need. The collection of nodes in the path is obtained and named 'ns'. From this, a collection of nodes where the creation date is less than 90 days from the target date is found and named 'bs'. The first node of the 'bs' collection is then found and named 'b', and the match is limited to cases where a 'b' node was found. The first and last relationships are then found and named 'r1' and 'rn'. After this, the difference in their dates is calculated and named 'd'. The match is then limited to cases where d is less than 15.
So that gives you an idea of how to do this. There is another problem though. At least, in the way you have described the problem, you will find that the date math will fail. Dates that are represented as numbers, such as 20140801, are not linear, and thus cannot be used for interval math. As an example, 15 days from 20140820 is 20140904. If you subtract these two date 'numbers', you get 84. One example of how to do this is to represent your dates as days since an epoch date.
Grace and peace,
Jim
Upvotes: 2