Jay Kim
Jay Kim

Reputation: 843

what data structure to use for range searches?

Trying to make a simple program to catalogue books. Something like this, for example:

struct book{
    string author;
    string title;
    int catalogNumber;
}

Ultimately, I want to be able to do title searches based on a range. So the user could specify to display results of books where the title begins with "aa" though "be". Ideally, the search average case would be logarithmic.

Is there something in the STL that could help me? Otherwise, what is the best way to go about this?

Thanks!

Upvotes: 0

Views: 1155

Answers (3)

Stefan Marinov
Stefan Marinov

Reputation: 581

You can put your elements in a std::set. The problem with that is that you'd probably like your users to be able to search by title as well as by author. A solution is just to maintain two sets, but if your data changes this can be tricky to maintain and you need twice as much space.

You can always write something like Trie, but chances are your data will change and it becomes harder to maintain the logarithmic search time. You can implement any kind of Self-balancing binary search tree, but that's essentially what a set is - a Red-black tree. Writing one is not the easiest task, however...

Update: You can hash everything and implement some form of the Rabin-Karp string search algorithm, but you should be aware that there are collisions possible if you do it. You can reduce the probability of one by double-hashing and/or using good hashing functions.

Upvotes: 1

amit
amit

Reputation: 178451

You can use a trie [expanding @smarinov suggestion here]:

Finding the set of relevant words with a common prefix is farily easy in a trie, just follow pointers in the trie until you reach the node representing the desired common prefix. This node is the trie containing the desired common prefix.

In your example, you will need:

range("aa","be") = prefix("a") + (prefix("b[a-e]")

The complexity expected for this OP is O(|S|), where |S| is the length of the common prefix. Note that any algorithm is expected to be not better then it [O(logn) algorithms are actually O(|S| * logn) because the compare op depends on the length of the string.

Upvotes: 1

Jerry Coffin
Jerry Coffin

Reputation: 490138

You can store them in an std::set, and use std::lower_bound and std::upper_bound to find a range (and yes, that should be logarithmic). To do that, you'll need to define operator< to operate on just the field(s) you care about (title, in this case).

If you're (virtually) always treating the title as the key, you might prefer to use an std::map<std::string, info>, with info defined like:

struct info { 
     string author;
     int catalogNumber;

     info(string a, int c) : author(a), catalogNumber(c) {}
};

This makes a few operations a little easier, such as:

books["Moby Dick"] = info("Herman Melville", 1234);

If you want to support searching by title or author (for example) consider using something like a Boost bimap or multi_index.

For what it's worth, I'd also give serious thought to using a string instead of an int for the catalog number. Almost none of the standard numbering systems (e.g., Dewey decimal, library of congress, ISBN) will store very nicely in an int.

Upvotes: 4

Related Questions