The PhobiCeron
The PhobiCeron

Reputation: 55

which c++ stl data structure will be the most efficient for storing unique values and their counts?

I have a list of values, with some recurrences, e.g. : {1,2,2,3,3,3,3,7,8,1}

I want to store the unique values in this list in a data structure along with their counts.

    --------------
    |value |count|
    --------------
    |  1   |  2  |
    --------------
    |  2   |  2  |
    --------------
    |  3   |  4  |
    --------------
    |  7   |  1  | 
    --------------
    |  8   |  1  |
    --------------

which c++ standard library data structure will be the most efficient in doing so?

edit: i wont be modifying the structure in any way, i just want to know the count as the count will help me determine the output to a programming question.

Upvotes: 1

Views: 1254

Answers (2)

einpoklum
einpoklum

Reputation: 132148

First note that asking for "the most efficient" data structure is not a proper characterization of your requirements. Do you want a solution which:

  • is the fastest to use? In what use cases?
  • takes up the least amount of memory?
  • is the most maintainable/readable?
  • is the least bug-prone?
  • is the fastest to write?
  • exists alongside the original data structure (for the uncounted values), or in its stead?

You see, there are different kinds and aspects to efficiency.

Having said that, you could try:

  • A simple and straightforward solution was suggested to you by @songyuanyao and @RahulGupta: Use a map - an std::map if you want to interate your value-counts in increasing order, or an std::unordered_map if you don't care about the order. This will be easy to write and maintain, and kind-of-ok in terms of the time of inserting or removing an element. Still, both of these map structures are quite slow, so you might rethink whether you even want a standard-library map implementation.

  • An alternative solution - which is more efficient in terms of space and time if you perform a lot of reads and few inserts/updates - is what @KonradRudolph suggested in a comment: std::pair<std::vector<value_type>, std::vector<count_type>> or std::vector<std::pair<value_type, count_type>>; and make sure the count_type is large enough that you don't exceed it, but as small as it can be to reduce the amount of time you need for reading the whole structure. These will use a lot less space than the maps, since there are no bucket-lists, no empty

    Note that the choice between a vector-of-pairs or a pair-of-vectors is a common dilemma in data structure design, and is also known as "structure of arrays vs array of structs", or SoA vs AoS. See a concrete example here on the site and there are many others. AoS is better when you usually access both fields, and need the corresponding values together; SoA is better when you often need just one field (e.g. you want to sum up the counts between some range of values; or you want to obtain the set of all prime values etc.) This also relates to the architecture of databases - row-oriented vs column-oriented, with the former being more appropriate for transaction processing and the latter for analytic workloads.

Upvotes: 6

Rahul Gupta
Rahul Gupta

Reputation: 11

You can use a map in c++ declaration can be done as

map<int,int>map_name;

for inserting you can run a loop

for(auto itr:list_name)
    map_name[itr]++;

for(auto c:map_name)
cout << c.first << " " << c.second << endl;

Upvotes: 0

Related Questions