A. Sarid
A. Sarid

Reputation: 3996

Stable sort a C++ hash map - preserve the insertion order for equal elements

Say I have a std::unordered_map<std::string, int> that represents a word and the number of times that word appeared in a book, and I want to be able to sort it by the value.
The problem is, I want the sorting to be stable, so that in case two items have equal value I want the one who got inserted first to the map to be first.

It is simple to implement it by adding addition field that will keep the time it got inserted. Then, create a comperator that uses both time and the value. Using simple std::sort will give me O(Nlog(N)) time complexity.

In my case, space is not an issue whenever time can be improved. I want to take advantage of it and do a bucket sorting. Which should give me O(N) time complexity. But when using bucket sorting, there is no comperator, when iterating the items in the map the order is not preserved.

How can I both make it stable and still keep the O(N) time complexity via bucket sorting or something else?
I guess that if I had some kind of hash map that preserves the order of insertion while iterating it, it would solve my issue.

Any other solutions with the same time complexity are acceptable.

Note - I already saw this and that and due to the fact that they are both from 2009 and that my case is more specific I think, I opened this question.

Upvotes: 1

Views: 463

Answers (1)

A. Sarid
A. Sarid

Reputation: 3996

Here is a possible solution I came up with using an std::unordered_map and tracking the order of inserting using a std::vector.

  1. Create a hash map with the string as key and count as value.
    In addition, create a vector with iterators to that map type.
  2. When counting elements, if the object is not yet in the map, add to both map and vector. Else, just increment the counter. The vector will preserve the order the elements got inserted to the map, and the insertion / update will still be in O(1) time complexity.
  3. Apply bucket sort by iterating over the vector (instead of the map), this ensures the order is preserved and we'll get a stable sort. O(N)
  4. Extract from the buckets to make a sorted array. O(N)

Implementation:

    unordered_map<std::string, int> map;
    std::vector<std::unordered_map<std::string,int>::iterator> order;

    // Lets assume this is my string stream
    std::vector<std::string> words = {"a","b","a" ... };

    // Insert elements to map and the corresponding iterator to order
    for (auto& word : words){
        auto it = map.emplace(word,1);
        if (!it.second){
            it.first->second++;
        }
        else {
            order.push_back(it.first);
        }
        max_count = std::max(max_count,it.first->second);
    }

    //  Bucket Sorting
    /*  We are iterating over the vector and not the map
        this ensures we are iterating by the order they got inserted */
    std::vector<std::vector<std::string>> buckets(max_count);
    for (auto o : order){
        int count = o->second;
        buckets[count-1].push_back(o->first);
    }

    std::vector<std::string> res;
    for (auto it = buckets.rbegin(); it != buckets.rend(); ++it)
        for (auto& str : *it)
            res.push_back(str);

Upvotes: 1

Related Questions