Reputation: 1322
Please see the image here:
https://picasaweb.google.com/108987384888529766314/CS3217Project#5717590602842112850
So, as you can see from the image, we are trying to store hierarchical data into a database. 1 publisher has may articles, 1 article has many comments and so on. Thus, if I use a relational database like SQL Server, I will have a publisher table, then an articles table and a comments table. But the comments table will grow very quickly and become very large.
Thus, is there any alternative which allows me to store and query such tree like data efficiently? How about NoSQL (MongoDB)?
Upvotes: 9
Views: 18948
Reputation: 720
I found this SO post when searching the same thing, The URL posted by Phpdevpad is a great read to understand how Adjacency List Model and Nested Set Model work and compare against each other. The article is very much in favor of the Nested Set Model and explains many draw backs to the Adjacency List Model, however I was greatly concerned about the mass updates the nested method would cause.
The main limitation to adjacency lists outlined in the article was that an additional self join was required for each layer of depth. However this limitation is easily overcome with the use of another language (such as php) and a recessive function for finding children such as outlined here: http://www.sitepoint.com/hierarchical-data-database/
<?php
// $parent is the parent of the children we want to see
// $level is increased when we go deeper into the tree,
// used to display a nice indented tree
function display_children($parent, $level) {
// retrieve all children of $parent
$result = mysql_query('SELECT title FROM tree WHERE parent="'.$parent.'";');
// display each child
while ($row = mysql_fetch_array($result)) {
// indent and display the title of this child
echo str_repeat(' ',$level).$row['title']."n";
// call this function again to display this
display_children($row['title'], $level+1);
}
}
// $node is the name of the node we want the path of
function get_path($node) {
// look up the parent of this node
$result = mysql_query('SELECT parent FROM tree WHERE title="'.$node.'";');
$row = mysql_fetch_array($result);
// save the path in this array
$path = array();
// only continue if this $node isn't the root node
// (that's the node with no parent)
if ($row['parent']!='') {
// the last part of the path to $node, is the name
// of the parent of $node
$path[] = $row['parent'];
// we should add the path to the parent of this node
// to the path
$path = array_merge(get_path($row['parent']), $path);
}
// return the path
return $path;
}
display_children('',0);
As a result I am now convinced that the Adjacency List Model will be far easier to use and manage moving forward.
Upvotes: 1
Reputation: 6408
Most NOSQL database design involves a mix of the following techniques:
The schema you craft depends on various aspects of you data. One solution to your problem may be the following schema:
db.articles { _id: ARTICLE_ID; publisher: "publisher name"; ... }
db.comments { _id: COMMENT_ID; article_id: ARTICLE_ID; ... }
Here the publisher is embedded in an article document. We can do this because it's unlikely the publisher name will change. It also saves us having to look up publisher details every time we need to access an article.
The comments are stored in their own documents, with each comment linking to an article. To find all comments associated to an article you can
db.comments.find({article_id:"My Atticle ID"}]
and to speed things up you could always add "article_id" to the index
db.comments.ensureIndex({article_id:1})
Upvotes: 2
Reputation:
Here is good survey of 8 NoSQL distributed databases and the needs that they fill.
Do you anticipate you will write more than you read?
Do you anticipate you will need low-latency data access, high concurrency support and high availability is a requirement?
Do you need dynamic queries?
Do you prefer to define indexes, not map/reduce functions?
Is versioning important?
Do you anticipate you will accumulate occasionally changing data, on which pre-defined queries are to be run?
Do you anticipate you will rapidly changing data with a foreseeable database size (should fit mostly in memory)?
Do you anticipate graph-style, rich or complex, interconnected data?
Do you anticipate you will need random, realtime read/write access to BigTable-like data?
Upvotes: 2
Reputation: 12592
You can use adjacent lists for hierarchical data. It's efficient and easy to implement. It works also with MySQL. Here a link: http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/.
Upvotes: 4