Reputation: 8042
Before I worked a little bit with relational mySQL but I am not considering myself as a guru in this area too. I have found following questions 1 2 regarding database design but I would like to know your opinion on my problem. I want to create my own sample data set on which I can testy my Cypher queries. One area that comes to my mind was the data set that will be similar to social network oriented to music listeners, something like LastFM.
So my first idea was create two type of nodes Bands and Persons:
(nir:Band { name: "Nirvana", town: "Seatle", country: "USA", genere: "Grunge" })
(dgr:Person { name: "Dave Grohl", born: 1969, instrument: "drums" })
As a Person I also created the users (not band members) of my social network. There are types of relations which I had:
(dgr)-[:IS_MEMBER_OF {from: 1987, to: 1994} ]->(nir)
(user1)-[:IS_FRIEND_OF]->(user6)
(user1)-[:LIKES]->(nir)
Then I realized that this concept has minimum three restrictions that I can see right now:
To solve first two issues I first thought about some data type which will be similar to array (as it is known from Python etc.). In this array more than one element (multiple genres or multiple towns and countries) can be stored, but I did not find anything about arrays in neo4j. Then I realized that all those restrictions can be elegantly solved naturally by neo4j, the only thing that is needed is slightly modification of nodes and relationships:
(nir:Band { name: "Nirvana" })
(foo:Band { name: "Foo Fighters" })
(dgr:Person { name: "Dave Grohl", born: 1969 })
(grn:Genere { name: "Grunge" })
(rck:Genere { name: "Rock" })
(dgr)-[:IS_MEMBER_OF {from: 1987, to: 1994, instrument:"drums"} ]->(nir)
(dgr)-[:IS_MEMBER_OF {from: 1994, to: 1998, instrument:"drums"} ]->(foo)
(dgr)-[:IS_MEMBER_OF {from: 1998, to: 2014, instrument:"guitar"} ]->(foo)
(stl:Town { name: "Seatle" })
(por:Town { name: "Portland" })
(usa:Country { name: "USA" })
(stl)->[:IS_IN]->(usa)
(por)->[:IS_IN]->(usa)
(nir)->[:IS_FROM]->(stl)
(nir)->[:IS_FROM]->(por)
(nir)->[:PLAYS]->(grn)
(nir)->[:PLAYS]->(rck)
(user1)-[:IS_FRIEND_OF]->(user6)
(user1)-[:LIKES]->(nir)
Finally my questions:
Edit in reply to @Michael Hunger
"You should ask yourself which queries / use cases do you want to solve with it"
If instrument
is member of IS_MEMBER_OF
relationship or if instrument
is member of Person
I am still able (maybe Cypher query will looks more clumsy, I do not know) to obtain desired data e.g. show me all drummers played in bands that originates from USA. Of course I am limited with afore mentioned restrictions (Person can play only on one instrument etc.). My question is that if it makes sense to create another (second proposed schema) database model if I am aware of those restrictions (first proposed schema) and I am happy with them. Is there any benefits which have second proposed schema over first one? The one that I can see right now is that second schema scales well in contrast to first one, is there something else e.g performance?
"For some specific use-cases it might be interesting to model the band membership as nodes, then you can connect them to a instrument node, an time-tree (year->month->membership) or put them into an order (with next relationships)." Can you please post some simple CYPHER example of this? It is hard to imagine for me.
"Graph databases pre-materialize relationships, and store them with the nodes they connect" Does this mean that following two are basically the same from performance perspective? Because both relationships connects nodes.
CREATE (dgr:Person {name:"Dave Grohl", instrument: "drums"})-[:IS_MEMBER_OF]->(nir:Band {name:'Nirvana'})
CREATE (dgr:Person {name:"Dave Grohl"})-[:IS_MEMBER_OF {instrument: "drums"} ]->(nir:Band {name:'Nirvana'})
Upvotes: 1
Views: 252
Reputation: 41676
You second model looks really great. You should ask yourself which queries / use cases do you want to solve with it, if it supports them all you're good.
For some specific use-cases it might be interesting to model the band membership as nodes, then you can connect them to a instrument node, an time-tree (year->month->membership) or put them into an order (with next relationships).
Regarding your question about join tables.
In graph databases those are not needed, the relationships take the role (but not the implementation of join tables). Graph databases pre-materialize relationships, and store them with the nodes they connect. So querying along that join is not expensive, as it is just following those existing records in the database.
So you also don't need technical primary and foreign keys. The only thing that makes sense is to index properties that you'd use to look-up your entities, e.g. :Person(name), :Band(name), same for genres, countries and towns (if you want to search for them by name).
A nice tool that might help you get started is this http://graphgen.neoxygen.io a example graph generator.
We also have several datasets and articles about the music domain if you're interested: http://www.neo4j.org/misc/music (the musicbrainz dataset is outdated and has to be updated).
Upvotes: 1