Himmators
Himmators

Reputation: 15006

What is the simplest way to save a file-tree in a postgres database?

I'm retrieving a file-tree from the dropbox api. In the api every folder is read with a separate, api call, so I will iterate through the entire file-tree to get all the folders. This is done with a cron-job.

The function that retrievs the data from dropbox looks like this:

function renderFolderTree($myobject, $path){
    $entry = $myobject->getMetadataWithChildren($path);
    foreach ($entry['contents'] as $child) {
        if ($child['is_dir']){
            $folderpath = $child['path'];
            //this will retrieve the child-folder
            renderFolderTree($myobject, $folderpath, $filetree);
            //here I need something that saves the folder
        }else{
            print_r($child);
            //here I need something that saves the file
        }
    }
}

I want to save the file-tree to a postgres-database so that it later can be ouputted as a json object that represents it.

I'm new to database design and am not sure what way to save the data. I assume every file and folder should have a database entry of their own. I could let every child reference it's parents ID, or I could let every parent contain a list of it's children.

As I am novice I would like a solution that is reasonably simple and reading speed is much more important than writing!

Upvotes: 3

Views: 5289

Answers (1)

Fabian
Fabian

Reputation: 2982

There are several options to store a tree in a relational database. For a good overview, I recommend the slides of Bill Karwin.

Since you mentioned that reading speed is most important, a closure table would be an appropriate, powerful encoding. A closure table is a many-to-many relation that stores for each path (e.g., /a/b/c) all parents/children (transitively). This way, many queries to the tree can be done with one single SQL query (non-recursively).

That would look like

create table nodes (
    path varchar primary key
    /* your other attributes here, can be null */
);

create table parents_children (
    parent_path varchar,
    child_path varchar,
    primary key(parent_path,child_path),
    foreign key (parent_path) references nodes (path),
    foreign key (child_path) references nodes (path)
);

To insert a new file /a/b/c under directory /a/b/, you would do:

insert into nodes values ('/a/b/c');

insert into parents_children
select parent_path, '/a/b/c' from parents_children where child_path = '/a/b/'
union all select '/a/b/c','/a/b/c';

To query, for example, all children from '/a' on recursively, you would do:

select * 
from nodes join parents_children on path = child_path
where parent_path = '/a';

A more exhaustive example, which stores the following file tree:

/
/a/
/a/b/
/a/b/d
/a/c
/b

To insert the data:

insert into nodes values ('/');
insert into parents_children values ('/','/');

insert into nodes values ('/a/');
insert into parents_children
select parent_path, '/a/' from parents_children where child_path = '/'
union all select '/a/','/a/';

insert into nodes values ('/a/b/');
insert into parents_children
select parent_path, '/a/b/' from parents_children where child_path = '/a/'
union all select '/a/b/','/a/b/';

insert into nodes values ('/a/c');
insert into parents_children
select parent_path, '/a/c' from parents_children where child_path = '/a/'
union all select '/a/c','/a/c';

insert into nodes values ('/a/b/d');
insert into parents_children
select parent_path, '/a/b/d' from parents_children where child_path = '/a/b/'
union all select '/a/b/d','/a/b/d';

insert into nodes values ('/b');
insert into parents_children
select parent_path, '/b' from parents_children where child_path = '/'
union all select '/b','/b';

To query all children of /a/

select node.*
from nodes join parents_children on path = child_path
where parent_path = '/a/';

path        
----------  
/a/         
/a/b/       
/a/b/d      
/a/c        

Upvotes: 12

Related Questions