oshri rehani
oshri rehani

Reputation: 37

Neo4j match nodes with similiraity property(array)

I have a nodes of movies with property like:genre,writers,languages(arrays of string)

I want to get all the nodes with similar array property,and the similarity percent

for example,for genre property:

Movie1:Crime, Drama, Myster

Movie2:Drama

33% of similarity

in addition

in addition,i want a query that returns all the movies with at least one common genre writers,languages

I know that I have to use collect function,but how i compare the arrays

for example: Movie1:Crime, Drama, Myster

Movie2:Crime, Myster

Movie3:Myster,Comedy

Movie4:Comedy

Group1:Movie1,Movie2,Movie3

Group2:Movie3,Movie4

Upvotes: 1

Views: 72

Answers (1)

stdob--
stdob--

Reputation: 29172

You can use the REDUCE to count the intersections:

WITH 
  ['Drama','Crime','Mystery'] as genre1,
  ['Drama'] as genre2
WITH
  genre1, 
  genre2,
  CASE WHEN size(genre1)>size(genre2) 
       THEN size(genre1) 
       ELSE size(genre2)
  END as maxSize, 
  REDUCE(acc=0, 
         genre in genre1 
         | acc + CASE WHEN genre in genre2 THEN 1 ELSE 0 END
  ) as similarity
RETURN genre1, 
       genre2, 
       100.0 * similarity / maxSize as similarity

Or you can use apoc.coll.intersection function from APOC library:

WITH 
  ['Drama','Crime','Mystery'] as genre1,
  ['Drama'] as genre2
WITH
  genre1, 
  genre2,
  apoc.coll.max([size(genre1), size(genre2)]) as maxSize,
  apoc.coll.intersection(genre1, genre2) as similarity
RETURN genre1, 
       genre2, 
       100.0 * size(similarity) / maxSize as similarity

If you just want to find the nodes that intersect at least one genre:

MATCH (M:Movie)
UNWIND M.genres as genre
WITH genre, 
     M
     ORDER BY id(M) ASC
WITH genre, 
     collect(M) as movies
RETURN distinct movies as movies

Upvotes: 2

Related Questions