Reputation: 2579
I've spent a couple of hours reading posts that were related to the question in a bid to try and come up with a solution, but I wasn't really successful in coming up with one.
So here goes: I was once asked in an interview which data structure I would use to search a if a particular word existed in a file. The file is also supposedly big enough to not be able to fit in the memory and the interviewer was really looking for an on-disk solution.
Is the B-Tree an on-disk data structure?
A Binary Search Tree is an in-memory data structure isn't it?
Upvotes: 2
Views: 2050
Reputation: 59983
There are really two different possible questions here:
Given a massive file, and a word, how do you check if the word exists in the file?
Given a massive file, how do you build an index so that you can efficiently check if an arbitrary word exists in the file?
The first problem is efficiently solved with Boyer-Moore and a linear search through the file. If you're only searching once, building an index is a complete waste of time.
Regarding the second problem, it sounds like the interviewer is really pushing B-Trees.
Upvotes: 4
Reputation: 82579
You want to use a data structure that maps one node to one page of disk space. This will minimize disk activity.
Because a B-Tree is often used for this. See http://en.wikipedia.org/wiki/B-tree, specifically the section "Time to search a sorted file".
Upvotes: 2
Reputation:
Both are just data-structures and can be both on-disk or in-memory. It depends on how you choose to use them.
btw, B-Trees were motivated by a need to have on-disk structures. Binary search trees are just a special case of B-trees, in one way.
Upvotes: 2