Wakan Tanka
Wakan Tanka

Reputation: 8042

neo4j example - graph vs relational concepts

Before I worked a little bit with relational mySQL but I am not considering myself as a guru in this area too. I have found following questions 1 2 regarding database design but I would like to know your opinion on my problem. I want to create my own sample data set on which I can testy my Cypher queries. One area that comes to my mind was the data set that will be similar to social network oriented to music listeners, something like LastFM.

So my first idea was create two type of nodes Bands and Persons:

(nir:Band   { name: "Nirvana", town: "Seatle", country: "USA",  genere: "Grunge" })
(dgr:Person { name: "Dave Grohl", born: 1969, instrument: "drums" })

As a Person I also created the users (not band members) of my social network. There are types of relations which I had:

(dgr)-[:IS_MEMBER_OF {from: 1987, to: 1994} ]->(nir)
(user1)-[:IS_FRIEND_OF]->(user6)
(user1)-[:LIKES]->(nir)

Then I realized that this concept has minimum three restrictions that I can see right now:

  1. Band can be classified only by one genre
  2. Band can originate only from one town/country
  3. Band member can only play on one instrument in all bands in which he ever was a member of

To solve first two issues I first thought about some data type which will be similar to array (as it is known from Python etc.). In this array more than one element (multiple genres or multiple towns and countries) can be stored, but I did not find anything about arrays in neo4j. Then I realized that all those restrictions can be elegantly solved naturally by neo4j, the only thing that is needed is slightly modification of nodes and relationships:

(nir:Band   { name: "Nirvana" })
(foo:Band   { name: "Foo Fighters" })

(dgr:Person { name: "Dave Grohl", born: 1969 })

(grn:Genere { name: "Grunge" })
(rck:Genere { name: "Rock" })

(dgr)-[:IS_MEMBER_OF {from: 1987, to: 1994, instrument:"drums"} ]->(nir)
(dgr)-[:IS_MEMBER_OF {from: 1994, to: 1998, instrument:"drums"} ]->(foo)
(dgr)-[:IS_MEMBER_OF {from: 1998, to: 2014, instrument:"guitar"} ]->(foo)

(stl:Town    { name: "Seatle" })
(por:Town    { name: "Portland" })

(usa:Country { name: "USA" })

(stl)->[:IS_IN]->(usa)
(por)->[:IS_IN]->(usa)

(nir)->[:IS_FROM]->(stl)
(nir)->[:IS_FROM]->(por)

(nir)->[:PLAYS]->(grn)
(nir)->[:PLAYS]->(rck)

(user1)-[:IS_FRIEND_OF]->(user6)
(user1)-[:LIKES]->(nir)

Finally my questions:

  1. Lets say that I am happy with restrictions mentioned above and It perfectly fits to my needs (band can originate from one town only etc.). Is still better to have different type of nodes (Town, Country, Genre) as It was mentioned? Is there any (performance) advantages using property in already existing node over crating completely different node type, would be relevant to have type node which represent e.g. Instrument or something completely different from future point of view?
  2. There is rule in relational database that when you have m:n relationship you will need join table. Can this be applied also on graph databases but instead of join table the creation of new node type is required (Town, Country, Genre)?

Edit in reply to @Michael Hunger

"You should ask yourself which queries / use cases do you want to solve with it" If instrument is member of IS_MEMBER_OF relationship or if instrument is member of Person I am still able (maybe Cypher query will looks more clumsy, I do not know) to obtain desired data e.g. show me all drummers played in bands that originates from USA. Of course I am limited with afore mentioned restrictions (Person can play only on one instrument etc.). My question is that if it makes sense to create another (second proposed schema) database model if I am aware of those restrictions (first proposed schema) and I am happy with them. Is there any benefits which have second proposed schema over first one? The one that I can see right now is that second schema scales well in contrast to first one, is there something else e.g performance?

"For some specific use-cases it might be interesting to model the band membership as nodes, then you can connect them to a instrument node, an time-tree (year->month->membership) or put them into an order (with next relationships)." Can you please post some simple CYPHER example of this? It is hard to imagine for me.

"Graph databases pre-materialize relationships, and store them with the nodes they connect" Does this mean that following two are basically the same from performance perspective? Because both relationships connects nodes.

CREATE (dgr:Person {name:"Dave Grohl", instrument: "drums"})-[:IS_MEMBER_OF]->(nir:Band {name:'Nirvana'})
CREATE (dgr:Person {name:"Dave Grohl"})-[:IS_MEMBER_OF {instrument: "drums"} ]->(nir:Band {name:'Nirvana'})

Upvotes: 1

Views: 252

Answers (1)

Michael Hunger
Michael Hunger

Reputation: 41676

You second model looks really great. You should ask yourself which queries / use cases do you want to solve with it, if it supports them all you're good.

For some specific use-cases it might be interesting to model the band membership as nodes, then you can connect them to a instrument node, an time-tree (year->month->membership) or put them into an order (with next relationships).

Regarding your question about join tables.

In graph databases those are not needed, the relationships take the role (but not the implementation of join tables). Graph databases pre-materialize relationships, and store them with the nodes they connect. So querying along that join is not expensive, as it is just following those existing records in the database.

So you also don't need technical primary and foreign keys. The only thing that makes sense is to index properties that you'd use to look-up your entities, e.g. :Person(name), :Band(name), same for genres, countries and towns (if you want to search for them by name).

A nice tool that might help you get started is this http://graphgen.neoxygen.io a example graph generator.

We also have several datasets and articles about the music domain if you're interested: http://www.neo4j.org/misc/music (the musicbrainz dataset is outdated and has to be updated).

Upvotes: 1

Related Questions