discoverAnkit
discoverAnkit

Reputation: 1151

Using B-Tree instead of Trie

This is an interview question and not a homework.

"You have N documents, where N is very large. Each document has a set of words lets say w1,w2..wm where m might differ for each document. Now you are given a list to K words lets say q1,q2…qk. Write an algorithm to print the list of document which have the K words in them."

Now, I could figure out solutions using Hashing and trie. But the guy who posted the question had also written that the interviewer wanted a solution using B-tree.

I am not really able to figure out how to use a B-Tree for this and how efficient would that be. Can somebody please help?

Upvotes: 3

Views: 448

Answers (2)

Ivan Zaitsau
Ivan Zaitsau

Reputation: 56

B-Tree is preferred over Trie if our dataset is stored on media with slow random access, for example on conventional hard drives. The interviewer's note that N is very large might imply that it's simply large enough to not fit in memory and should be placed on disk.

As noted in comments: when the data is really huge and it is stored on a disk, the efficiency of a data structure depends more on the number of disk block accesses, not the total amount of all operations. B-Tree contains many records in one node (which can be considered a "data block"), thus requires significantly fewer block accesses than Trie does.

That is exactly the same reason why most DBs store their indexes in a B-Tree. They need fast search through index located on conventional hard drive. Actually, your problem can be solved by putting your (word-documentId) pairs in DB table and creating an index on word column or the entire pair.

Upvotes: 1

Cybercartel
Cybercartel

Reputation: 12592

You can try a ternary trie. It doesn't take so much space. You can also look for a Kart-trie. It uses a key and 2 leafs:http://code.dogmap.org/kart/.

Upvotes: 0

Related Questions