POD
POD

Reputation: 519

Vector or Map or Hash map for C++?

I have a large number of records, say around 4,000,000, that I want to address them repeatedly and put information in a class that is linked to that record. I am not sure what kind of data structure should I use? Should I use the vectors, maps, or hash maps. I don't need to insert a record, but I need to read a table which contain sets of these records numbers (or names), and then grab some of the data which are linked to that record and do some processes on them. Is the finding on map fast enough to not go for hashmaps for this example? The records have a class as its structure and I have not done anything before with using the map or hashmap that has a class as its value (if it is possible). Thanks in advance guys.

Edited:

I do not need to have all the records on the memory at the same time for now> I need to first give it a structure first and then grab the data from some of the records. The total number of records is around 20 million, and I want to read each of these raw records and then if its basic info doesn't exist in my new map or vector that I want to create and put the rest of data in there as a vector. Because I have 20 million records, I think it would be very excruciating that for every record go through 4 million record to find if the basic info of that record exist or not. I have around 4 million type of packages and each of these packages could have more than one type of service (roughly around 5 (20/4) per package). I want to read each of these records and then if the package ID does not exist into the vector or whatever I want to use and push the basic info into the vector and then for the services that are related to that package be saved in a vector inside the package class.

Upvotes: 4

Views: 7152

Answers (1)

Samy Arous
Samy Arous

Reputation: 6814

These three data structures have each a different purpose.

A vector is basically a dynamic array, which is good for indexed values.

A map is a sorted data-structure with O(log(n)) retrieval and insertion time (implemented using a balanced binary tree, usually Red-Black). This is best if you can't find an efficient hash method.

A hash_map uses hashes to retrieve object. If you have a well defined hash function with a low collision rate, you will get constant retrieval and insertion time on average. hash_maps are usually faster than map but not always. It is highly dependent on the hash function.

For your example, I think that it is best to use a hash_map where the key would be the record number (assuming record numbers are unique).

If these record numbers are dense (meaning there are little or no gaps between the indexes , say: 1,2,4,5,8,9,10...), you can use a vector. If your records are coming from a database with an autoincrement primary key and not many deletions, this should be usually the case.

Upvotes: 7

Related Questions