\n
Foo\n / \\\nBar1 Bar2\n
\n id | parent_id | name | attribute 1 | attribute 2\n 1 | null | Foo | 0.4 | "tastes like chicken"\n 2 | 1 | Bar 1 | null | "doesn't everything"\n 3 | 1 | Bar 2 | 0.2 | null\n
\n(In production I would use an additional column(s) to store the tree structure, but this is omitted for the sake of simplicity.)
\nIn this system, Bar1 would inherit a value of 0.4 for attribute 1, and Bar2 would inherit a value of "tastes like chicken" for attribute 2.
\nAs I am far from a guru of any stripe, I have several questions about how best to work with system.
\nAs I need to be able to export from the database in the original format, I can't simply pre-calculate and cache the "missing" records. Or can I? Note that caching would have be somewhat intelligent, as an update may change values in Foo that should propogate down to Bar1 and Bar2.
\nAre there standard database tools or SQL parameters that could "build" the Bar records correctly in one (or relatively few) queries? I know about common table expressions, and it seems like this is halfway to a solution, but this would return the tree of rows, and then further processing would be required to fill out all attributes appropriately.
\nAre their other tips and trips for working with this type of data and database structure? Maybe this is a silly question, but I don't have any formal programming education, so many concepts are quite new to me.
\nNote that creating separate tables for the various branches is not a possibility, as the tree can have arbitrary depth. Anticipated tree size is ~ 3000 child records, with a tree 3 levels deep, and thousands of trees to be stored.
\nI am working with Django, if that matters, but am semi-comfortable working with raw SQL queries (e.g. I build a Django app to manipulate directed acyclic graphs, which uses mostly raw SQL because of the complicated grouped by and having clauses).
\n","author":{"@type":"Person","name":"Chris Mutel"},"upvoteCount":0,"answerCount":1,"acceptedAnswer":{"@type":"Answer","text":"You did not specify your database, but looking at your link to the directed acyclic graphs I saw that you use PostgreSQL. If you can settle with the newest version 8.4 you can try the following (that should answer your 2nd question):
\n\nUsing the table
\n\ncreate table Tree(\n id serial primary key check(id > 0),\n parent int references Tree(id),\n name varchar(100) not null unique check(length(name)>0),\n attr1 int,\n attr2 varchar(15),\n constraint charlength check(id > parent)\n);\n
\n\nfilled with this data
\n\ninsert into Tree (name, parent, attr1, attr2)\n values ('Foo', null, 5, 'high'), -- will have id 1\n ('Bar1', 1, null, 'low'), -- will have id 2\n ('Bar2', 1, null, null),\n ('Bar3', 2, null, null);\n
\n\nthe query
\n\nwith recursive parents(id, name, attr1, attr2, parent, level, who) as (\n select id, name, attr1, attr2, parent, 1, id\n from Tree t\n where t.name in ('Bar3', 'Bar2')\n union all\n select t.id, p.name, coalesce(p.attr1, t.attr1), coalesce(p.attr2, t.attr2), t.parent, p.level+1, p.who\n from parents p, Tree t\n where t.id = p.parent\n ) select distinct on (who) who, name, attr1, attr2\n from parents\n order by who, level desc;\n
\n\nyields (with my ruby program wrapping the queries):
\n\n{\"attr1\"=>\"5\", \"name\"=>\"Bar2\", \"attr2\"=>\"high\", \"who\"=>\"3\"}\n{\"attr1\"=>\"5\", \"name\"=>\"Bar3\", \"attr2\"=>\"low\", \"who\"=>\"4\"}\n
\n\nAbout your 1st question I'm not sure what you mean. But maybe you can recover the XML from its presentation in the database?!? For your 3rd question I hope to be able to post this weekend an alternative for storing trees. Otherwise I'd say that your ideas look good to me.
\n","author":{"@type":"Person","name":"j.p."},"upvoteCount":1}}}Reputation: 2779
I have a set of records, stored as XML files, where the XML files are arranged in a tree structure. For each child record, elements or attributes which are not explicitly stated are assumed to be inherited from the parent record. This is easy to model in a database, with a self-referential foreign key, e.g.
Foo
/ \
Bar1 Bar2
id | parent_id | name | attribute 1 | attribute 2
1 | null | Foo | 0.4 | "tastes like chicken"
2 | 1 | Bar 1 | null | "doesn't everything"
3 | 1 | Bar 2 | 0.2 | null
(In production I would use an additional column(s) to store the tree structure, but this is omitted for the sake of simplicity.)
In this system, Bar1 would inherit a value of 0.4 for attribute 1, and Bar2 would inherit a value of "tastes like chicken" for attribute 2.
As I am far from a guru of any stripe, I have several questions about how best to work with system.
As I need to be able to export from the database in the original format, I can't simply pre-calculate and cache the "missing" records. Or can I? Note that caching would have be somewhat intelligent, as an update may change values in Foo that should propogate down to Bar1 and Bar2.
Are there standard database tools or SQL parameters that could "build" the Bar records correctly in one (or relatively few) queries? I know about common table expressions, and it seems like this is halfway to a solution, but this would return the tree of rows, and then further processing would be required to fill out all attributes appropriately.
Are their other tips and trips for working with this type of data and database structure? Maybe this is a silly question, but I don't have any formal programming education, so many concepts are quite new to me.
Note that creating separate tables for the various branches is not a possibility, as the tree can have arbitrary depth. Anticipated tree size is ~ 3000 child records, with a tree 3 levels deep, and thousands of trees to be stored.
I am working with Django, if that matters, but am semi-comfortable working with raw SQL queries (e.g. I build a Django app to manipulate directed acyclic graphs, which uses mostly raw SQL because of the complicated grouped by and having clauses).
Upvotes: 0
Views: 517
Reputation: 1041
You did not specify your database, but looking at your link to the directed acyclic graphs I saw that you use PostgreSQL. If you can settle with the newest version 8.4 you can try the following (that should answer your 2nd question):
Using the table
create table Tree(
id serial primary key check(id > 0),
parent int references Tree(id),
name varchar(100) not null unique check(length(name)>0),
attr1 int,
attr2 varchar(15),
constraint charlength check(id > parent)
);
filled with this data
insert into Tree (name, parent, attr1, attr2)
values ('Foo', null, 5, 'high'), -- will have id 1
('Bar1', 1, null, 'low'), -- will have id 2
('Bar2', 1, null, null),
('Bar3', 2, null, null);
the query
with recursive parents(id, name, attr1, attr2, parent, level, who) as (
select id, name, attr1, attr2, parent, 1, id
from Tree t
where t.name in ('Bar3', 'Bar2')
union all
select t.id, p.name, coalesce(p.attr1, t.attr1), coalesce(p.attr2, t.attr2), t.parent, p.level+1, p.who
from parents p, Tree t
where t.id = p.parent
) select distinct on (who) who, name, attr1, attr2
from parents
order by who, level desc;
yields (with my ruby program wrapping the queries):
{"attr1"=>"5", "name"=>"Bar2", "attr2"=>"high", "who"=>"3"}
{"attr1"=>"5", "name"=>"Bar3", "attr2"=>"low", "who"=>"4"}
About your 1st question I'm not sure what you mean. But maybe you can recover the XML from its presentation in the database?!? For your 3rd question I hope to be able to post this weekend an alternative for storing trees. Otherwise I'd say that your ideas look good to me.
Upvotes: 1