Reputation: 1322

Efficient way to store and query tree-like hierarchical data

Please see the image here:

https://picasaweb.google.com/108987384888529766314/CS3217Project#5717590602842112850

So, as you can see from the image, we are trying to store hierarchical data into a database. 1 publisher has may articles, 1 article has many comments and so on. Thus, if I use a relational database like SQL Server, I will have a publisher table, then an articles table and a comments table. But the comments table will grow very quickly and become very large.

Thus, is there any alternative which allows me to store and query such tree like data efficiently? How about NoSQL (MongoDB)?

Upvotes: 9

Answers (4)

tempcke

Reputation: 720

I found this SO post when searching the same thing, The URL posted by Phpdevpad is a great read to understand how Adjacency List Model and Nested Set Model work and compare against each other. The article is very much in favor of the Nested Set Model and explains many draw backs to the Adjacency List Model, however I was greatly concerned about the mass updates the nested method would cause.

The main limitation to adjacency lists outlined in the article was that an additional self join was required for each layer of depth. However this limitation is easily overcome with the use of another language (such as php) and a recessive function for finding children such as outlined here: http://www.sitepoint.com/hierarchical-data-database/

snippet from url above using the Adjacency List Model

<?php
// $parent is the parent of the children we want to see
// $level is increased when we go deeper into the tree,
//        used to display a nice indented tree 
function display_children($parent, $level) {

  // retrieve all children of $parent
  $result = mysql_query('SELECT title FROM tree WHERE parent="'.$parent.'";');

  // display each child
  while ($row = mysql_fetch_array($result)) {

    // indent and display the title of this child
    echo str_repeat('  ',$level).$row['title']."n";

    // call this function again to display this
    display_children($row['title'], $level+1);
  }
}

// $node is the name of the node we want the path of
function get_path($node) {

  // look up the parent of this node
  $result = mysql_query('SELECT parent FROM tree WHERE title="'.$node.'";');
  $row = mysql_fetch_array($result);

  // save the path in this array
  $path = array();

  // only continue if this $node isn't the root node
  // (that's the node with no parent)
  if ($row['parent']!='') {

    // the last part of the path to $node, is the name
    // of the parent of $node
    $path[] = $row['parent'];

    // we should add the path to the parent of this node
    // to the path
    $path = array_merge(get_path($row['parent']), $path);
  }

  // return the path
  return $path;
}
display_children('',0);

Conclusion

As a result I am now convinced that the Adjacency List Model will be far easier to use and manage moving forward.

Upvotes: 1

Ivo Bosticky

Reputation: 6408

Most NOSQL database design involves a mix of the following techniques:

Embedding - nesting of objects and arrays inside a document
Linking - references between documents

The schema you craft depends on various aspects of you data. One solution to your problem may be the following schema:

db.articles { _id: ARTICLE_ID;  publisher: "publisher name";     ...    }
db.comments { _id: COMMENT_ID; article_id: ARTICLE_ID;    ... }

Here the publisher is embedded in an article document. We can do this because it's unlikely the publisher name will change. It also saves us having to look up publisher details every time we need to access an article.

The comments are stored in their own documents, with each comment linking to an article. To find all comments associated to an article you can

db.comments.find({article_id:"My Atticle ID"}]

and to speed things up you could always add "article_id" to the index

db.comments.ensureIndex({article_id:1})

Upvotes: 2

user78706

Reputation:

Here is good survey of 8 NoSQL distributed databases and the needs that they fill.

Do you anticipate you will write more than you read?
Do you anticipate you will need low-latency data access, high concurrency support and high availability is a requirement?
Do you need dynamic queries?
Do you prefer to define indexes, not map/reduce functions?
Is versioning important?
Do you anticipate you will accumulate occasionally changing data, on which pre-defined queries are to be run?
Do you anticipate you will rapidly changing data with a foreseeable database size (should fit mostly in memory)?
Do you anticipate graph-style, rich or complex, interconnected data?
Do you anticipate you will need random, realtime read/write access to BigTable-like data?

Upvotes: 2

Cybercartel

Reputation: 12592

You can use adjacent lists for hierarchical data. It's efficient and easy to implement. It works also with MySQL. Here a link: http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/.

Upvotes: 4

Efficient way to store and query tree-like hierarchical data

Answers (4)

snippet from url above using the Adjacency List Model

Conclusion

Related Questions