Reputation: 7587
Imagine a graph filled with data about individuals. Each node has a property named "age". Now i want to return a sample including as many nodes as different values for "age" exist, so if there is one node for each age between 0 and 90, the sample size would be 91.
How can i achieve this through cypher?
What i actually want to do is returning a certain amount of random elements, each with a distinct value for "age", so just obtaining every distinct property value without the corresponding node is not sufficient.
Upvotes: 0
Views: 2140
Reputation: 30397
If you have or can gain the ability to change the graph, you may want to extract out individual's ages to :Age nodes (which only works because age is static in your data).
APOC Procedures has a categorization refactoring procedure that can help out here.
That way to get a person for each age, you just match on all :Age nodes, and get one connected node for each of them.
EDIT
As far as making the selection for each age random, we can use cybersam's approach of collecting and grabbing a random index.
With APOC Procedures, we also have the option of using apoc.coll.randomItem()
to grab a random item from a collection. It's basically doing the same thing under the hood.
The full query (assuming you have distinct :Age nodes (with an "age" property) with relationships to :Person nodes) would look like this:
MATCH (age:Age)<-[:HasAge]-(p:Person)
RETURN age.age as age, apoc.coll.randomItem(collect(p)) as randomPerson
You mentioned that you need "a certain amount of random elements" each with a distinct age, so we can modify the above query to collect the randomPersons-per-age, and use apoc.coll.randomItems()
to get however many random entries you need.
MATCH (age:Age)<-[:HasAge]-(p:Person)
WITH age, apoc.coll.randomItem(collect(p)) as randomPerson
RETURN apoc.coll.randomItems(collect(randomPerson), {numberOfItemsDesired}) as randomPeople
Upvotes: 0
Reputation: 66967
This might work for you:
MATCH (p:Person)
RETURN p.age AS age, COLLECT(p)[TOINT(rand() * COUNT(p))] AS person;
The query collects all the people who have each distinct age in the DB, and picks a random one for each age.
Upvotes: 3