Reputation: 213
I have the following table which says the frequency of task each Originator performs. (Please see the attached image).
Task-Frequency for each Originators
I represented the above table in Neo4j with the relationship Originator -[Frequency]->Task.
Now I need to compute similarity(eg. Jaccard Similarity) between two users using Cypher Queries only. Would like to know how is it possible or would the schema definition be altered altogether.
Thanks in advance.
Upvotes: 3
Views: 2174
Reputation: 213
The link solved my problem. Just had to take every links into consideration.
http://neo4j.com/docs/stable/cypher-cookbook-similarity-calc.html
Upvotes: 1
Reputation: 3739
This is more a starting point then an answer! If we start by ignoring the value of the frequency then I think that you can try something like:
MATCH (u1:Originator{name:'John'}), (u2:Originator{name:'Sue'})
WITH u1, u2
OPTIONAL MATCH common=(u1)-[:FREQUENCY]->(t:Task)<-[:FREQUENCY]-(u2)
WITH u1, u2, COUNT(common) as intersection
OPTIONAL MATCH (u1)-[:FREQUENCY]->(t:Task)
WITH u1, u2, intersection, COLLECT(DISTINCT t) AS t1s
OPTIONAL MATCH (u2)-[:FREQUENCY]->(t:Task)
WHERE NOT t IN t1s
WITH u1, u2, intersection, t1s + COLLECT(DISTINCT t) AS union
RETURN u1, u2, intersection / union as js
This is definitely untested and there are probably efficiencies to be found by somehow not repeatedly matching the tasks.
What the query is doing is finding the tasks that the two users have in common and storing the number of common tasks in the variable intersection
. It then uses individually matches (optionally) each user's tasks and uses these to calculate the union
(COLLECT
will create a zero length array where there are zero matches). There could be a divide by zero issue to work around in the final return statement.
How frequency should affect the result is hard to say, I wonder if you would be better served by swapping :Frequency
with :Completed
and creating a new relationship for every task completed (i.e 6 relationships between 'John' and 'Act A'). This would be great for supporting the intersection
but would still have interesting connotations for the Union
.
Upvotes: 1