Reputation: 37
I have a nodes of movies with property like:genre,writers,languages(arrays of string)
I want to get all the nodes with similar array property,and the similarity percent
for example,for genre property:
Movie1:Crime, Drama, Myster
Movie2:Drama
33% of similarity
in addition
in addition,i want a query that returns all the movies with at least one common genre writers,languages
I know that I have to use collect function,but how i compare the arrays
for example: Movie1:Crime, Drama, Myster
Movie2:Crime, Myster
Movie3:Myster,Comedy
Movie4:Comedy
Group1:Movie1,Movie2,Movie3
Group2:Movie3,Movie4
Upvotes: 1
Views: 72
Reputation: 29172
You can use the REDUCE
to count the intersections:
WITH
['Drama','Crime','Mystery'] as genre1,
['Drama'] as genre2
WITH
genre1,
genre2,
CASE WHEN size(genre1)>size(genre2)
THEN size(genre1)
ELSE size(genre2)
END as maxSize,
REDUCE(acc=0,
genre in genre1
| acc + CASE WHEN genre in genre2 THEN 1 ELSE 0 END
) as similarity
RETURN genre1,
genre2,
100.0 * similarity / maxSize as similarity
Or you can use apoc.coll.intersection
function from APOC library
:
WITH
['Drama','Crime','Mystery'] as genre1,
['Drama'] as genre2
WITH
genre1,
genre2,
apoc.coll.max([size(genre1), size(genre2)]) as maxSize,
apoc.coll.intersection(genre1, genre2) as similarity
RETURN genre1,
genre2,
100.0 * size(similarity) / maxSize as similarity
If you just want to find the nodes that intersect at least one genre:
MATCH (M:Movie)
UNWIND M.genres as genre
WITH genre,
M
ORDER BY id(M) ASC
WITH genre,
collect(M) as movies
RETURN distinct movies as movies
Upvotes: 2