Dot NET
Dot NET

Reputation: 4897

How can a tree be stored in a database?

If one were to implement a tree using a 4GL like C#, and storing it in a database, such as SQL Server 2008, what would the schema/design look like?

In other words, what role would the database play in such an implementation?

Upvotes: 7

Views: 1896

Answers (2)

Branko Dimitrijevic
Branko Dimitrijevic

Reputation: 52107

Storing the Tree

There are several options:

  1. It is just a tree, after all, so you could store it the same way you would store any other tree (essentially via a recursive FOREIGN KEY).
  2. Or, convert the suffix tree into suffix array and store that into the the database.
  3. Or, you could serialize it to (say) XML, then store it to a single CLOB.
  4. Or, since a suffix tree is roughly 20 times larger than the "target" string it indexes, you could simply store the string and calculate the suffix tree on as-needed basis (e.g. using Ukkonen's algorithm).

NOTE: In the case of the suffix array, you won't be storing any characters, you'd just store the indexes describing each element, something like this:

CREATE TABLE SUFFIX_ARRAY (
    ORDER INT PRIMARY KEY, -- Position in the suffix array.
    START INT NOT NULL, -- Position of the starting character of the suffix within the target string.
    LONGEST_COMMON_PREFIX INT NOT NULL -- If useful for your application.
)

You'd also have to separately store the "target" string (e.g. in a CLOB in another table).

Using the Tree

  1. If you store the suffix tree directly, you should be able search it directly using SQL.
  2. If you store it as suffix array, you'll have to juggle a bit to implement the binary search through the SQL, but should be possible.
  3. (and 4) If you store it in CLOB (or don't store it at all and just store the target string), then obviously you won't be able to access it directly in the database (not efficiently anyway) - your only option would be to load (or re-create) it in memory.

Upvotes: 6

Clockwork-Muse
Clockwork-Muse

Reputation: 13046

Tree structures in an RDBMS is usually handled with a combination of cross-reference tables and recursive queries.

Text
================
id  -- autoincrement
text  -- varchar

Text_Suffix
=================
startingTextId  -- fk reference to Text.id
suffixPartId  -- fk reference to Text.id

So... with this example data - 

Text
=================
1  |  lay
2  |  er
3  |  ing
4  |  s

Text_Suffix
==================
1  |  2
1  |  3
1  |  4
2  |  4

You'd use a query like this:

WITH All_Suffixes (id, text) as (SELECT id, text
                                 FROM Text as a
                                 EXCEPTION JOIN Text_Suffix as b
                                 ON b.suffixPartId = a.id
                                 UNION ALL
                                 SELECT b.suffixPartId, a.text + c.text
                                 FROM All_Suffixes as a
                                 JOIN Text_Suffix as b
                                 ON b.startingTextId = a.id
                                 JOIN Text as c
                                 ON c.id = b.suffixPartId)
SELECT *
FROM All_Suffixes

Which should generates results like so:

1  |  lay
2  |  layer
3  |  laying
4  |  lays
4  |  layers

Upvotes: 4

Related Questions