Alex
Alex

Reputation: 2289

Creating a map for millions of objects in C++

I have an abstract class called Object that has a few virtual functions, one of which is a function that will retrieve the id of an Object.

Currently, I am using a std::vector<Object> to store tens of millions of these objects. Unfortunately, adding, copying, or removing from this is painfully slow.

I wanted to create a hash map that could maybe have the Object->id as the key, and maybe the object itself as a value? Or is there some type of data structure that would allow for easy insertion and removal like a std::vector but would be faster for tens of millions of objects?

I would want the class to end up looking something like this outline:

stl::container<Objects*> obj_container;

DataContainer::DataContainer()
    : stl::container(initialized_here)
{}

DataContainer::addObject(Object* object)
{
    obj_container.insert(object);
}

DataContainer::removeObject(Object* object)
{
    obj_container.remove(object);
}

DataContainer::preSort()
{
     obj_container.sort_by_id();
}

DataContainer::getObject(Object* object)
{
    if(!obj_container.contains(object)) { return; }
    binary_search(object);
}        

Is there anything really fast at processing large amounts of these objects, or is there anything really fast that could possibly use an unsigned integer id from an object to process the data?

Also, my class would get pre-sorted, so every object would be sorted by ID before being added to the container. Then I would do a binary search on the data by ID.

Upvotes: 3

Views: 1266

Answers (2)

You probably could use std::set (if the id-s have some order and are unique for it) or std::unordered_set and I would suggest you make it a component of your container, not derive your container from it. You'll better have a way of constructing a local fake Object with only its id ...

class Object {
   friend class BigContainer;
   unsigned long _id;
   // other fields;
   // your constructors
  public:
    unsigned long id() const { return _id; };
  private:
   Object(unsigned long pseudoid); // construct a fake object
};

struct LessById {
  bool operator () (const Object &ob1, const Object& ob2) 
    { return ob1.id() < ob2.id(); };
  bool operator () (const Object &ob, unsigned long idl)
    { return ob1.id() < idl;
};

class BigContainer {
   std::set<Object,LessById> set;
public:
   // add members, constructors, destructors, etc...
   bool contains(unsigned long id) const {
     Object fakeobj{id};
     if (set.find(fakeobj) != set.end()) return true;
     return false;
   };
   const Object* find_by_id(unsigned long id) const {
     Object fakeobj{id};
     auto p = set.find(fakeobj);
     if (p != set.end()) return &(*p);
     return nullptr;
   };

   bool contains(const Object& ob) const {
      if (set.find(ob) != set.end()) return true;
      return false;
   };
   void add(const Object&ob) const {
     Object fakeobj{id};
     auto p = set.find(fakeobj);
     if (p == set.end()) set.insert(ob);
   }
   void remove(unsigned long id) const {
     Object fakeobj{id};
     auto p = set.find(fakeobj);
     if (p != set.end()) set.erase(p);
   }
};

If you want a set of pointers use a set of some smart pointers and adapt the scheme above.

If the Object is big and you have trouble in defining a way of constructing efficiently local fake objects for a given id, define a super struct BoxedId { unsigned long id; BoxedId(unsigned long l): id(l) {}; }, declare internally a std::set<std::shared_ptr<BoxedId>,BoxedLessById> make class Object : public BoxedId, etc...

BTW, since Object has virtual methods you probably will subclass it and you need to have a set of pointers. You need to define a pointer policy (are every actual instances of sub-classes of Object-s in your Container) and use some smart pointer.... You need to define who is in charge of delete-ing your Object-s (who owns the pointer). Is it only the unique BigContainer.

Read the C++11 rule of five.

Upvotes: 2

CreativeMind
CreativeMind

Reputation: 917

Please have a look at this site : http://www.cs.northwestern.edu/~riesbeck/programming/c++/stl-summary.html

It shows the time complexity of each operation of each STL.

First be clear about your requirement and then choose particular STL wisely by comparing its time complexity shown in above link.

Upvotes: 1

Related Questions