Chris
Chris

Reputation: 4485

Create trie on ssd. How to manage object movements to other locations?

I need to create a trie on ssd. I can't use much RAM as the trie is very big, but 4 GB RAM are no problem.

Currently I think about doing it the following way:

Now I am looking for tools that could help. I have problems when a object (node) gets larger. I need to find a new location in the file for this object, changing all links to this object. And then I am left with a gap in my file. Then I would need to compress my tree and change all positions of all objects to close some gaps. Leaving some space after each object would lead to very much space requirements.

Do you know libraries or have some hints to solve this problem or that could help programming all this?

Upvotes: 1

Views: 386

Answers (2)

usr
usr

Reputation: 171178

I am trying to provide a new angle to this problem here: Why don't you store the trie-nodes in a database like SQLite? SQLite is fast, well tested and feature rich. It is likely to do a much better job than you.

Relational databases are not really made to store trees, but they can. I cannot think of any particular query problem that you could solve significantly better by writing a custom trie on-disk data structure.

Upvotes: 0

Blindy
Blindy

Reputation: 67352

Edit: this is for the memory mapped file approach, I really liked your intuition on that.

Edit2: every time I say "point" or "pointer", I actually mean a zero-based offset from the beginning of the file. Since written data never moves around, the nodes' position acts as global identifiers for them.

A node shouldn't really get larger though. The way I'd do it is have the node be like:

  • the character held by the node (UTF-8 encoded if needed)
  • an array of say 8 items that hold pointers to its children. this is statically dimensioned with NULL (or 0) specifying no more children. This list never gets shorter, only larger.
  • a pointer to a piece of memory that holds another array of children pointers, also statically dimensioned. you always have this even if you don't actually need the extra space, you can just write NULL in it.
    • if pointing to actual valid memory, right after the list you'd have another pointer to an extra list if needed, so you can go however far you need to. Alternatively, the second list can be big enough to hold all characters.

As an alternative, statically allocate memory enough for all characters from the start. This may get too large too fast though, depending on the sparseness of your tree.

Either way, note that with this way your actual node size never increases, it has a static length. You can add the extra nodes or extra list chunks at the end of your file as needed, and hold a root at the beginning to point to all its children so you never have to mess with the head.

Upvotes: 1

Related Questions