Reputation: 11
I recently started working with Neo4j/cypher and have been able to successfully build most basic queries that come to mind but the solution to this one escapes me.
The nodes have a very simple relationship model: Books are grouped into Categories
The books will be unique and can be related to multiple categories.
My base query collects the categories, resulting in a set of books with their associated categories:
match (c:Category)-[:contains]-(b:Book)
return b as book, collect(distinct c) as categories
I can then collect the books, resulting in a set of related books and categories:
match (c:Category)-[:contains]-(b:Book)
with b, collect(distinct c) as categories
return collect(distinct b) as books, categories
This seems to be going in the right direction but there are many duplicate books and categories throughout. Here is a pseudo example:
Books Categories
-----------------------------------------------
[Easy Home Updates] [Home and Garden]
-----------------------------------------------
[Gardening Today, [Outdoors,
Gardening for Kids, Hobbies,
Green Thumb Made Easy] Gardening]
-----------------------------------------------
[Conversational Spanish, [Spanish,
Spanish for Travelers, Travel,
Advanced Spanish] Language]
-----------------------------------------------
[Gardening Today, [Gardening,
Gardening for Kids] Kids]
-----------------------------------------------
[Home Improvement, [Home Improvement,
Easy Home Updates, Home and Garden,
Family Home Projects] Family]
-----------------------------------------------
[Gardening Today] [Gardening]
-----------------------------------------------
[Conversational Spanish, [Language,
Advanced Spanish] Spanish]
I cannot seem to find a way to aggregate the duplicates either in the initial match with filtering or the reduce and apoc functions.
The desired result would be to reduce both the book and category collections. Something like this:
Books Categories
----------------------------------------------
[Gardening Today, [Gardening,
Gardening for Kids, Outdoors,
Green Thumb Made Easy] Hobbies,
Kids,
Family]
----------------------------------------------
[Conversational Spanish, [Spanish,
Spanish for Travelers, Language,
Advanced Spanish] Travel,
Education]
----------------------------------------------
[Home Improvement, [Home and Garden,
Easy Home Updates, Home Improvement,
Family Home Projects] Construction]
Or maybe my approach is completely off and there is a better, more efficient way to group the related nodes.
Any help you can provide to point me in the right direction would be greatly appreciated. Please let me know if you need any further clarification.
Upvotes: 1
Views: 137
Reputation: 1304
For the ease of possible further answers and solutions I note my graph creating statement:
CREATE
(categoryHome:Category {name: 'Home and Garden'}),
(categoryOutdoor:Category {name: 'Outdoors'}),
(categoryHobby:Category {name: 'Hobbies'}),
(categoryGarden:Category {name: 'Gardening'}),
(categorySpanish:Category {name: 'Spanish'}),
(categoryTravel:Category {name: 'Travel'}),
(categoryLanguage:Category {name: 'Language'}),
(categoryKids:Category {name: 'Kids'}),
(categoryImprovement:Category {name: 'Home Improvement'}),
(categoryFamily:Category {name: 'Family'}),
(book1:Book {name: 'Easy Home Updates'}),
(book2:Book {name: 'Gardening Today'}),
(book3:Book {name: 'Gardening for Kids'}),
(book4:Book {name: 'Green Thumb Made Easy'}),
(book5:Book {name: 'Conversational Spanish'}),
(book6:Book {name: 'Spanish for Travelers'}),
(book7:Book {name: 'Advanced Spanish'}),
(book8:Book {name: 'Home Improvement'}),
(book9:Book {name: 'Easy Home Updates'}),
(book10:Book {name: 'Family Home Projects'}),
(categoryHome)-[:CONTAINS]->(book1),
(categoryHome)-[:CONTAINS]->(book8),
(categoryHome)-[:CONTAINS]->(book9),
(categoryHome)-[:CONTAINS]->(book10),
(categoryOutdoor)-[:CONTAINS]->(book2),
(categoryOutdoor)-[:CONTAINS]->(book3),
(categoryOutdoor)-[:CONTAINS]->(book4),
(categoryHobby)-[:CONTAINS]->(book2),
(categoryHobby)-[:CONTAINS]->(book3),
(categoryHobby)-[:CONTAINS]->(book4),
(categoryGarden)-[:CONTAINS]->(book2),
(categoryGarden)-[:CONTAINS]->(book3),
(categoryGarden)-[:CONTAINS]->(book4),
(categorySpanish)-[:CONTAINS]->(book5),
(categorySpanish)-[:CONTAINS]->(book6),
(categorySpanish)-[:CONTAINS]->(book7),
(categoryTravel)-[:CONTAINS]->(book5),
(categoryTravel)-[:CONTAINS]->(book6),
(categoryTravel)-[:CONTAINS]->(book7),
(categoryLanguage)-[:CONTAINS]->(book5),
(categoryLanguage)-[:CONTAINS]->(book6),
(categoryLanguage)-[:CONTAINS]->(book7),
(categoryKids)-[:CONTAINS]->(book2),
(categoryKids)-[:CONTAINS]->(book3),
(categoryImprovement)-[:CONTAINS]->(book8),
(categoryImprovement)-[:CONTAINS]->(book9),
(categoryImprovement)-[:CONTAINS]->(book10),
(categoryFamily)-[:CONTAINS]->(book8),
(categoryFamily)-[:CONTAINS]->(book9),
(categoryFamily)-[:CONTAINS]->(book10);
In my eyes, your technical implementation is right, but your requirements from a professional point of view are not consistent. Let us choose an example. You expect the following record:
BOOKS: CATEGORIES:
Gardening Today, Gardening,
Gardening for Kids, Outdoors,
Green Thumb Made Easy Hobbies,
Kids,
Family
By executing the following Cypher query, the Family
entry is not a valid category for the book Gardening Today
.
MATCH (book:Book {name: 'Gardening Today'})<-[:CONTAINS]-(category:Category)
RETURN DISTINCT book.name, collect(category.name);
╒═════════════════╤═════════════════════════════════════════╕
│"book.name" │"collect(category.name)" │
╞═════════════════╪═════════════════════════════════════════╡
│"Gardening Today"│["Kids","Gardening","Hobbies","Outdoors"]│
└─────────────────┴─────────────────────────────────────────┘
Doing the cross check confirms category Family
contains completly other books.
MATCH (category:Category {name: 'Family'})-[:CONTAINS]->(book:Book)
RETURN DISTINCT category.name, collect(book.name);
╒═══════════════╤═══════════════════════════════════════════════════════════════╕
│"category.name"│"collect(book.name)" │
╞═══════════════╪═══════════════════════════════════════════════════════════════╡
│"Family" │["Family Home Projects","Easy Home Updates","Home Improvement"]│
└───────────────┴───────────────────────────────────────────────────────────────┘
This procedure continues to propagate. That's the reason why you get differently sliced result sets as expected. So your already implemented approach is correct:
MATCH path = (category:Category)-[:CONTAINS]->(book:Book)
WITH collect(category.name) AS categoryGroup, book.name AS bookName
RETURN categoryGroup, collect(bookName);
╒═════════════════════════════════════════════════════════════════╤═════════════════════════════════════════════════════════════════════╕
│"categoryGroup" │"collect(bookName)" │
╞═════════════════════════════════════════════════════════════════╪═════════════════════════════════════════════════════════════════════╡
│["Spanish","Travel","Language"] │["Spanish for Travelers","Advanced Spanish","Conversational Spanish"]│
├─────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────┤
│["Home Improvement","Family","Home and Garden","Home and Garden"]│["Easy Home Updates"] │
├─────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────┤
│["Hobbies","Gardening","Kids","Outdoors"] │["Gardening Today","Gardening for Kids"] │
├─────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────┤
│["Hobbies","Gardening","Outdoors"] │["Green Thumb Made Easy"] │
├─────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────┤
│["Home Improvement","Family","Home and Garden"] │["Home Improvement","Family Home Projects"] │
└─────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────┘
Because the requested mapping violates the assignment rules (set theory), we can not use the usual pattern matching. Instead, we can achieve our goal through a trick by finding all connected nodes for a given book and preparing them afterwards.
Please ensure you have installed the Neo4j APOC library.
MATCH (selectedBook:Book)
WHERE selectedBook.name = 'Gardening for Kids'
CALL apoc.path.subgraphNodes(selectedBook, {uniqueness: 'NODE_GLOBAL'}) YIELD node
WITH collect(DISTINCT node) AS subgraphNodes
WITH
filter (node IN subgraphNodes
WHERE node:Category) AS categories,
filter (node IN subgraphNodes
WHERE node:Book) AS books
WITH categories, books
UNWIND categories AS category
UNWIND books AS book
RETURN collect(DISTINCT category.name) AS categoryNames, collect(DISTINCT book.name) AS bookNames;
Category
and Book
Easy Home Updates:
╒═══════════════════════════════════════════════╤═══════════════════════════════════════════════════════════════╕
│"categoryNames" │"bookNames" │
╞═══════════════════════════════════════════════╪═══════════════════════════════════════════════════════════════╡
│["Home and Garden","Family","Home Improvement"]│["Easy Home Updates","Family Home Projects","Home Improvement"]│
└───────────────────────────────────────────────┴───────────────────────────────────────────────────────────────┘
Gardening for Kids:
╒════════════════════════════════════════╤════════════════════════════════════════╕
│"categoryNames" │"bookNames" │
╞════════════════════════════════════════╪════════════════════════════════════════╡
│["Kids","Gardening","Hobbies","Outdoors"│["Gardening for Kids","Gardening Today",│
│] │"Green Thumb Made Easy"] │
└────────────────────────────────────────┴────────────────────────────────────────┘
Upvotes: 0