Reputation: 11

Aggregate related nodes across all collections in Neo4j cypher

I recently started working with Neo4j/cypher and have been able to successfully build most basic queries that come to mind but the solution to this one escapes me.

The nodes have a very simple relationship model: Books are grouped into Categories

The books will be unique and can be related to multiple categories.

My base query collects the categories, resulting in a set of books with their associated categories:

match (c:Category)-[:contains]-(b:Book)
return b as book, collect(distinct c) as categories

I can then collect the books, resulting in a set of related books and categories:

match (c:Category)-[:contains]-(b:Book)
with b, collect(distinct c) as categories
return collect(distinct b) as books, categories

This seems to be going in the right direction but there are many duplicate books and categories throughout. Here is a pseudo example:

Books                         Categories
-----------------------------------------------
[Easy Home Updates]           [Home and Garden]
-----------------------------------------------
[Gardening Today,             [Outdoors,
 Gardening for Kids,           Hobbies,
 Green Thumb Made Easy]        Gardening]
-----------------------------------------------
[Conversational Spanish,      [Spanish,
 Spanish for Travelers,        Travel,
 Advanced Spanish]             Language]
-----------------------------------------------
[Gardening Today,             [Gardening,
 Gardening for Kids]           Kids]
-----------------------------------------------
[Home Improvement,            [Home Improvement,
 Easy Home Updates,            Home and Garden,
 Family Home Projects]         Family]
-----------------------------------------------
[Gardening Today]             [Gardening]
-----------------------------------------------
[Conversational Spanish,      [Language,
 Advanced Spanish]             Spanish]

I cannot seem to find a way to aggregate the duplicates either in the initial match with filtering or the reduce and apoc functions.

The desired result would be to reduce both the book and category collections. Something like this:

Books                         Categories
----------------------------------------------
[Gardening Today,             [Gardening,
 Gardening for Kids,           Outdoors,
 Green Thumb Made Easy]        Hobbies,
                               Kids,
                               Family]
----------------------------------------------
[Conversational Spanish,      [Spanish,
 Spanish for Travelers,        Language,
 Advanced Spanish]             Travel,
                               Education]
----------------------------------------------
[Home Improvement,            [Home and Garden,
 Easy Home Updates,            Home Improvement,
 Family Home Projects]         Construction]

Or maybe my approach is completely off and there is a better, more efficient way to group the related nodes.

Any help you can provide to point me in the right direction would be greatly appreciated. Please let me know if you need any further clarification.

Upvotes: 1

Answers (1)

ThirstForKnowledge

Reputation: 1304

Creating your model

For the ease of possible further answers and solutions I note my graph creating statement:

CREATE
  (categoryHome:Category {name: 'Home and Garden'}),
  (categoryOutdoor:Category {name: 'Outdoors'}),
  (categoryHobby:Category {name: 'Hobbies'}),
  (categoryGarden:Category {name: 'Gardening'}),
  (categorySpanish:Category {name: 'Spanish'}),
  (categoryTravel:Category {name: 'Travel'}),
  (categoryLanguage:Category {name: 'Language'}),
  (categoryKids:Category {name: 'Kids'}),
  (categoryImprovement:Category {name: 'Home Improvement'}),
  (categoryFamily:Category {name: 'Family'}),
  (book1:Book {name: 'Easy Home Updates'}),
  (book2:Book {name: 'Gardening Today'}),
  (book3:Book {name: 'Gardening for Kids'}),
  (book4:Book {name: 'Green Thumb Made Easy'}),
  (book5:Book {name: 'Conversational Spanish'}),
  (book6:Book {name: 'Spanish for Travelers'}),
  (book7:Book {name: 'Advanced Spanish'}),
  (book8:Book {name: 'Home Improvement'}),
  (book9:Book {name: 'Easy Home Updates'}),
  (book10:Book {name: 'Family Home Projects'}),
  (categoryHome)-[:CONTAINS]->(book1),
  (categoryHome)-[:CONTAINS]->(book8),
  (categoryHome)-[:CONTAINS]->(book9),
  (categoryHome)-[:CONTAINS]->(book10),
  (categoryOutdoor)-[:CONTAINS]->(book2),
  (categoryOutdoor)-[:CONTAINS]->(book3),
  (categoryOutdoor)-[:CONTAINS]->(book4),
  (categoryHobby)-[:CONTAINS]->(book2),
  (categoryHobby)-[:CONTAINS]->(book3),
  (categoryHobby)-[:CONTAINS]->(book4),
  (categoryGarden)-[:CONTAINS]->(book2),
  (categoryGarden)-[:CONTAINS]->(book3),
  (categoryGarden)-[:CONTAINS]->(book4),
  (categorySpanish)-[:CONTAINS]->(book5),
  (categorySpanish)-[:CONTAINS]->(book6),
  (categorySpanish)-[:CONTAINS]->(book7),
  (categoryTravel)-[:CONTAINS]->(book5),
  (categoryTravel)-[:CONTAINS]->(book6),
  (categoryTravel)-[:CONTAINS]->(book7),
  (categoryLanguage)-[:CONTAINS]->(book5),
  (categoryLanguage)-[:CONTAINS]->(book6),
  (categoryLanguage)-[:CONTAINS]->(book7),
  (categoryKids)-[:CONTAINS]->(book2),
  (categoryKids)-[:CONTAINS]->(book3),
  (categoryImprovement)-[:CONTAINS]->(book8),
  (categoryImprovement)-[:CONTAINS]->(book9),
  (categoryImprovement)-[:CONTAINS]->(book10),
  (categoryFamily)-[:CONTAINS]->(book8),
  (categoryFamily)-[:CONTAINS]->(book9),
  (categoryFamily)-[:CONTAINS]->(book10);

Explanation

In my eyes, your technical implementation is right, but your requirements from a professional point of view are not consistent. Let us choose an example. You expect the following record:

BOOKS:                        CATEGORIES:
Gardening Today,              Gardening,
Gardening for Kids,           Outdoors,
Green Thumb Made Easy         Hobbies,
                              Kids,
                              Family

By executing the following Cypher query, the Family entry is not a valid category for the book Gardening Today.

MATCH (book:Book {name: 'Gardening Today'})<-[:CONTAINS]-(category:Category)
RETURN DISTINCT book.name, collect(category.name);

╒═════════════════╤═════════════════════════════════════════╕
│"book.name"      │"collect(category.name)"                 │
╞═════════════════╪═════════════════════════════════════════╡
│"Gardening Today"│["Kids","Gardening","Hobbies","Outdoors"]│
└─────────────────┴─────────────────────────────────────────┘

Doing the cross check confirms category Family contains completly other books.

MATCH (category:Category {name: 'Family'})-[:CONTAINS]->(book:Book)
RETURN DISTINCT category.name, collect(book.name);
╒═══════════════╤═══════════════════════════════════════════════════════════════╕
│"category.name"│"collect(book.name)"                                           │
╞═══════════════╪═══════════════════════════════════════════════════════════════╡
│"Family"       │["Family Home Projects","Easy Home Updates","Home Improvement"]│
└───────────────┴───────────────────────────────────────────────────────────────┘

This procedure continues to propagate. That's the reason why you get differently sliced result sets as expected. So your already implemented approach is correct:

MATCH path = (category:Category)-[:CONTAINS]->(book:Book)
WITH collect(category.name) AS categoryGroup, book.name AS bookName
RETURN categoryGroup, collect(bookName);

╒═════════════════════════════════════════════════════════════════╤═════════════════════════════════════════════════════════════════════╕
│"categoryGroup"                                                  │"collect(bookName)"                                                  │
╞═════════════════════════════════════════════════════════════════╪═════════════════════════════════════════════════════════════════════╡
│["Spanish","Travel","Language"]                                  │["Spanish for Travelers","Advanced Spanish","Conversational Spanish"]│
├─────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────┤
│["Home Improvement","Family","Home and Garden","Home and Garden"]│["Easy Home Updates"]                                                │
├─────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────┤
│["Hobbies","Gardening","Kids","Outdoors"]                        │["Gardening Today","Gardening for Kids"]                             │
├─────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────┤
│["Hobbies","Gardening","Outdoors"]                               │["Green Thumb Made Easy"]                                            │
├─────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────┤
│["Home Improvement","Family","Home and Garden"]                  │["Home Improvement","Family Home Projects"]                          │
└─────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────┘

Extension

Underlying Idea

Because the requested mapping violates the assignment rules (set theory), we can not use the usual pattern matching. Instead, we can achieve our goal through a trick by finding all connected nodes for a given book and preparing them afterwards.

Please ensure you have installed the Neo4j APOC library.

Solution

MATCH (selectedBook:Book)
  WHERE selectedBook.name = 'Gardening for Kids'
CALL apoc.path.subgraphNodes(selectedBook, {uniqueness: 'NODE_GLOBAL'}) YIELD node
WITH collect(DISTINCT node) AS subgraphNodes
WITH
  filter (node IN subgraphNodes
    WHERE node:Category) AS categories,
  filter (node IN subgraphNodes
    WHERE node:Book) AS books
WITH categories, books
UNWIND categories AS category
UNWIND books AS book
RETURN collect(DISTINCT category.name) AS categoryNames, collect(DISTINCT book.name) AS bookNames;

Explanation

lines 1-2: selecting the book under inspection
line 3: using the APOC procedure apoc.path.subgraphNodes for locating all connected nodes
lines 6-9: sorting the identified nodes by label Category and Book
lines 10-13: result preparation

Results

Easy Home Updates:

╒═══════════════════════════════════════════════╤═══════════════════════════════════════════════════════════════╕
│"categoryNames"                                │"bookNames"                                                    │
╞═══════════════════════════════════════════════╪═══════════════════════════════════════════════════════════════╡
│["Home and Garden","Family","Home Improvement"]│["Easy Home Updates","Family Home Projects","Home Improvement"]│
└───────────────────────────────────────────────┴───────────────────────────────────────────────────────────────┘

Gardening for Kids:

╒════════════════════════════════════════╤════════════════════════════════════════╕
│"categoryNames"                         │"bookNames"                             │
╞════════════════════════════════════════╪════════════════════════════════════════╡
│["Kids","Gardening","Hobbies","Outdoors"│["Gardening for Kids","Gardening Today",│
│]                                       │"Green Thumb Made Easy"]                │
└────────────────────────────────────────┴────────────────────────────────────────┘

Upvotes: 0

Aggregate related nodes across all collections in Neo4j cypher

Answers (1)

Creating your model

Explanation

Extension

Underlying Idea

Solution

Explanation

Results

Related Questions