Paul Rooney
Paul Rooney

Reputation: 21609

Fastest data structure or algorithm for fast lookup on 2 keys

In my application I'm storing a collection of data structures which hold 2 integer reference values.

I am using a std::map with the internal reference as the key, but this this then leaves me with a problem that if I have to look up by the external reference, I have to potentially iterate over the whole map to find the right entry. As this list could potentially contain thousands of entries, this is painful to consider.

The following code shows a simple example.

#include <iostream>
#include <map>

class MyData
{
    public:
    MyData(int internal_id, int external_id)
        : internal_id_(internal_id), external_id_(external_id) 
    {}
    int internal_id_;
    int external_id_;
    /* more data members ... */
};

int main(int argc, char** argv)
{
    std::map<int, MyData*> datamap;

    /*
        Build the map structure with arbitrary values.
    */
    for(int i = 0; i < 100; ++i)
    {
        MyData* md = new MyData(i, (100 - i));
        std::cout << md->internal_id_ << " " << md->external_id_ << std::endl; 
        datamap.insert(std::make_pair(i, md));
    }

    /*
        Find with internal id 50 Cheap lookup O(log N) (I think)

    */
    std::map<int, MyData*>::iterator it1;
    if((it1 = datamap.find(50)) != datamap.end())
    {
        std::cout << "Found Mydata with internal id 50 external id is " << it1->second->external_id_ << std::endl;
    }

    /* 
        Find with external id 35. Expensive lookup O(N)
    */
    std::map<int, MyData*>::iterator it2;

    for(it2 = datamap.begin(); it2 != datamap.end(); ++it2)
    {
        if(it2->second->external_id_ == 35)
        {
            std::cout << "Found with external id 35 internal id is " << it2->second->internal_id_ << std::endl;
            break;
        }
    }

    /* remove from map and clean up allocated MyData objects ... */
}

Which approach could I take to take to improve the look up from the external reference?

I have considered the following as options.

Of these the 3rd option seems like the most sane. Are there any better options?

Upvotes: 0

Views: 2243

Answers (2)

Tony Delroy
Tony Delroy

Reputation: 106096

  • If either of the keys are near-contiguous (i.e. typically use successive values without too many unused numbers in between) then an array - directly indexed by that id - is optimal, otherwise
  • if you're creating new keys that are ever-higher numerically, you could push_back to a vector and use a std::binary_search or even an interpolation search, otherwise
  • unordered_map or map.

As always - to know what's fastest, implement the alternatives and measure (but I've listed them above in my expected order of performance).

If using the 1st or 3rd option, you might want to put both maps into a class so that insertions & deletions are consistently done across both, and linked-to objects only deleted when not needed (you could also manage this using shared pointers but that might be a bit heavyweight - depends on your needs.

Upvotes: 2

user3125280
user3125280

Reputation: 2829

It may be enough to just map external id's to internal id's. That way an object can always be found given either of its id's. If you need to delete something via one key, you find it, determine it's other key, then delete it and its external key entry.

(This so you don't have to change the existing lookup code, only add a new map)

Upvotes: 1

Related Questions