Ninja420
Ninja420

Reputation: 3872

Reducing memory usage while making a trie/suffix tree in C++

I am trying to make a trie in c++, now what my basic data structure looks like is ..

struct node{
    int count; no of times this node has been visited.
    struct node* child[ALPHABET_SIZE]; // Let ALPHABET_SIZE be 26
}

When the string size gets large a lot of allocated memory is wasted. Like if we insert "he" Our tree would be

root---->h--->e
    |--->e

We see that at root only 2/26th of the memory allocated is being used. How to improve ??.

Upvotes: 4

Views: 3487

Answers (3)

ash
ash

Reputation: 21

Use adjacency list.

Rather than a tree, we can create a list of nodes. A node will be dictionaries, each having "current value" (the alphabet) and "next state" (list of indices of the child nodes). We can add other required attributes in the node.

In your case: The list will be -

[{"value":"", "next_state":[1 ]}, {"value":"h", "next_state":[2]}, {"value":"e", "next_state":[ ]}]

Now say, we add "his". The list will be updated to :

[{"value":"", "next_state": [1 ]}, {"value": "h", "next_state": [2 , 3]}, {"value":"e", "next_state":[ ]}, {"value":"i", "next_state":[4]}, {"value":"s", "next_state":[ ]},]

Notice, the next_state of node in index-1. We have two child nodes - "e" and "i".

It's very efficient and easy to implement. However, the trie operations will be considerably slower.

Upvotes: 0

stan0
stan0

Reputation: 11807

Instead of creating a fixed size array for each node, create an array with 1 element and resize it (replace it with a new array with size+1) when you insert a child. Inserting would be slower so you might test and change the resizing algorithm (size+1 or size*2 or size + size/2) so that there are fewer allocations if it gets too slow.

Upvotes: 0

Michael Goldshteyn
Michael Goldshteyn

Reputation: 74390

Some very basic advice:

  1. If your branching factor is predicted to be low, consider using something other than an array for the children. For example, you can have an array of letter to node* pairs and either do a linear or a binary search on them (if they are ordered). You can also use a map of some sort.
  2. You can also play with smaller integer sizes for the count, if you don't expect counts in the millions/billions.
  3. Instead of dynamically allocating nodes, you can get them from an arena based allocator (i.e., object pool), avoiding the heap allocation overhead that is often added to objects allocated on the heap.

Upvotes: 2

Related Questions