kyandy
kyandy

Reputation: 21

Neo4j, How to return the top n from diffreent tag groups?

I have a database which represents "Posts", it contains creation date, tags, titles. some posts are answers to other posts, then some posts are questions and other posts are answers. The task here is to find the questions which have the shortest answered time in each different tag group. How I achieve this?

I already can found all answered questions with their time difference between the questions' creation time and its first answer's creation time for each tag, but I cannot return the question with the shortest answer time for each tag (top 1 in each tag group). I can only return all things.

Can anyone help me with this issue?

This is my query:

WITH ['geospatial', 'economics', 'usa', 'demographics'] AS topiclist
UNWIND topiclist AS topics
Match (p1:Posts)
UNWIND p1.Tags AS tags
WITH p1,trim(tags) AS tag
Where tag = topics
Match (p1)-[:PARENT_OF]->(p2:Posts)
WITH p1, p2.CreationDate - p1.CreationDate AS time,tag,p2
ORDER BY tag,time
Return p1.Title ,time,tag

Thanks for Michael Hunger's answer, that solved my issue!! If you have a similar question, please check his answer.

output sample:

returned result, the 3 colums represent : questions, time difference, tags

Sample data:

All data will be used looks like this image, the post with a parentId is an answer, its parent is its belonged question

Upvotes: 0

Views: 180

Answers (1)

Michael Hunger
Michael Hunger

Reputation: 41676

It would be good if you do the trim when you store the data.

WITH ['geospatial', 'economics', 'usa', 'demographics'] AS topiclist
UNWIND topiclist AS topics

Match (p1:Posts) where single(tag in p1.tags where trim(tag) = topic)
Match (p1)-[:PARENT_OF]->(p2:Posts)
WITH p1, p2.CreationDate - p1.CreationDate AS time,tag,p2
ORDER BY tag,time
with tag, head(collect({post:p1,time:time}) as first
Return first.post.Title as title,first.time as time,tag

Upvotes: 1

Related Questions