Reputation: 247
Assuming I have a neo4j graph of the nature:
create (SK:Author {name:'Stephen King'}), (JK:Author {name:'J.K. Rowling'}), (DS:Author {name:'Dr. Seuss'}), (TS:Book {name:'The Stand'}), (HP:Book {name:'Harry Potter'}), (CH:Book {name:'Cat in the Hat'}), (SHINING:Book {name:'The Shining'}), (PAF:Genre {name:'Post-Apocalyptic fiction'}), (F:Genre {name:'Fantasy'}), (C:Genre {name:'Childrens'}), (HORROR:Genre {name:'Horror'}), (SK)<-[:WRITTEN_BY]-(TS)-[:CATEGORIZED_AS]->(PAF), (JK)<-[:WRITTEN_BY]-(HP)-[:CATEGORIZED_AS]->(F), (DS)<-[:WRITTEN_BY]-(CH)-[:CATEGORIZED_AS]->(C), (SK)<-[:WRITTEN_BY]-(SHINING)-[:CATEGORIZED_AS]->(HORROR)
Neo4j console link: http://console.neo4j.org/r/2d69kq
I have about 53,000 author nodes, 6 million book nodes, and 9,000 genre nodes.
For a query like:
match (b:Book)-[:WRITTEN_BY]->(a:Author)
where a.name in ['Stephen King', 'J.K. Rowling']
with a, collect(b) as bs
unwind bs as book
match (g:Genre)<-[r:CATEGORIZED_AS]-(book)
where id(g) in [13, 14, 15, 16]
with a, count(distinct book) as book_count_author, collect(book) as bs
unwind bs as book
match (g:Genre)<-[r:CATEGORIZED_AS]-(book)
where id(g) in [13, 14, 15, 16]
return a.name, g.name, count(distinct book) as book_count_genre, book_count_author
It takes roughly 12 seconds to complete. I've tried to rewrite the query a few different ways and use index hinting, but can't figure out any ways to make it any faster. Any thoughts? Obviously this example is simplified, but I do have indexes on the appropriate properties.
Here's a sample result from the chained aggregations:
I need two aggregations. The first one is count of books by authors, constrained by the specified genres in the 2nd match. The second count is of books by each author in each genre, again constrained by the same genres.
Upvotes: 0
Views: 70
Reputation: 41706
What @InverseFalcon said.
You can optimize it a bit further:
MATCH (g:Genre)<-[r:CATEGORIZED_AS]-(book:Book)-[:WRITTEN_BY]->(a:Author)
WHERE a.name in ['Stephen King', 'J.K. Rowling']
AND g.name IN ['Post-Apocalyptic fiction','Childrens','Horror','Fantasy']
WITH a, g, count(distinct book) as book_count_author_genre
RETURN a.name, collect({ genre: g.name, count: book_count_author_genre}),
sum(book_count_author_genre) as book_count_author
You might need to use index-hints for authors and genres
If you use id's for lookup it will be faster.
Upvotes: 0
Reputation: 30407
Assuming that you have an index (or existence constraint) on :Author.name, you can probably use this:
match (book:Book)-[:WRITTEN_BY]->(a:Author)
where a.name in ['Stephen King', 'J.K. Rowling']
with a, book
match (g:Genre)<-[:CATEGORIZED_AS]-(book)
return a, g, size((g)<-[:CATEGORIZED_AS]-()) as book_count_genre, count(book) as book_count_author
Collecting and unwinding can be expensive, try to avoid it where you can. Note that you can find the size of a pattern to get a count of the pattern that would otherwise not be retrievable by a count on a node given your columns.
EDIT
Modified query for clarified requirements
book_count_genre - the number of books with that genre by any author
book_count_author - the number of books with that genre written by the given author
Upvotes: 0