user12678036
user12678036

Reputation: 59

Neo4j query takes an eternity to execute

My code takes an eternity to compute jaccard similarity. It is an .csv file with 100000 in it. I have already created indexes on 2 basic Nodes (id+ value) I have already use the Jaccard algorithm in Playground but it also takes an eternity to run.

MATCH (i:Item)-[:HAS]->(p2:Properties)<-[:HAS]-(i1:Item)
WITH {item:id(i), categories: collect(id(i1))} as userData
WITH collect(userData) as data
CALL algo.similarity.jaccard.stream(data, {similarityCutoff: 0.5})
YIELD item1, item2, count1, count2, intersection, similarity
RETURN algo.asNode(item1).id AS from, algo.asNode(item2).id AS to, intersection, similarity

Can anyone help? enter image description here

Upvotes: 1

Views: 73

Answers (1)

jose_bacoy
jose_bacoy

Reputation: 12684

The first two lines syntax of your query is not correct. You should run it like this:

OLD: 
MATCH (i:Item)-[:HAS]->(p2:Properties)<-[:HAS]-(i1:Item)
WITH {item:id(i), categories: collect(id(i1))} as userData

NEW: 
MATCH (i:Item)-[:HAS]->(p2:Properties)
WITH {item:id(i), categories: collect(id(p2))} as userData

This is what the algorithm (jaccard ) is doing. An item (say Item1) is similar (number from 0 to 1 inclusive) to another item (like Item2) if both shares the same properties. For example; Item1 has 3 properties1,2,3 and Item2 has 3 properties2,3,4. So the jaccard similarity index is 2/4 or 0.5 because property2,3 are common and there are 4 unique properties in both items.

So in your query, you only need to specify that an item (like item1) has some properties and you don't need to specify another item (like item2) has some properties. The function will iterate all items and will give you the jaccard index, that is, item1 vs item2, item1 vs item3..., item2 vs item3, so on...This is the syntax for algo.similarity.jaccard.stream.

See reference here: https://neo4j.com/docs/graph-algorithms/current/labs-algorithms/jaccard/

Upvotes: 1

Related Questions