rrazd
rrazd

Reputation: 1739

Determining most freq char element in a vector<char>?

I am trying to determine the most frequent character in a vector that has chars as its elements.

I am thinking of doing this:

This seems quite convoluted though, thus I was wondering if someone could suggest if this method would be considered 'acceptable' in terms of performance/good coding

Can this be done in a better way?

Upvotes: 2

Views: 1846

Answers (2)

Phillip Ngan
Phillip Ngan

Reputation: 16106

Sorting a vector of chars and then iterating through that looking for the maximum run lengths seems to be 5 times faster than using the map approach (using the fairly unscientific test code below acting on 16M chars). On the surface both functions should perform close to each other because they execute with O(N log N). However the sorting method probably benefits from branch prediction and move semantics of the in-place vector sort.

The resultant output is:

Most freq char is '\334', appears 66288 times.
usingSort() took 938 milliseconds
Most freq char is '\334', appears 66288 times.
usingMap() took 5124 milliseconds

And the code is:

#include <iostream>
#include <map>
#include <vector>
#include <chrono>

void usingMap(std::vector<char> v)
{
    std::map<char, int> m;
    for ( auto c : v )
    {
        auto it= m.find(c);
        if( it != m.end() )
            m[c]++;
        else
            m[c] = 1;
    }

    char mostFreq;
    int count = 0;
    for ( auto mi : m )
        if ( mi.second > count )
        {
            mostFreq = mi.first;
            count = mi.second;
        }

    std::cout << "Most freq char is '" << mostFreq << "', appears " << count << " times.\n";
}

void usingSort(std::vector<char> v)
{
    std::sort( v.begin(), v.end() );
    char currentChar = v[0];
    char mostChar = v[0];
    int currentCount = 0;
    int mostCount = 0;
    for ( auto c : v )
    {
        if ( c == currentChar )
            currentCount++;
        else
        {
            if ( currentCount > mostCount)
            {
                mostChar = currentChar;
                mostCount = currentCount;
            }
            currentChar = c;
            currentCount = 1;
        }
    }

    std::cout << "Most freq char is '" << mostChar << "', appears " << mostCount << " times.\n";
}

int main(int argc, const char * argv[])
{
    size_t size = 1024*1024*16;
    std::vector<char> v(size);
    for ( int i = 0; i < size; i++)
    {
        v[i] = random() % 256;
    }

    auto t1 = std::chrono::high_resolution_clock::now();
    usingSort(v);
    auto t2 = std::chrono::high_resolution_clock::now();
    std::cout
        << "usingSort() took "
        << std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count()
        << " milliseconds\n";
    auto t3 = std::chrono::high_resolution_clock::now();
    usingMap(v);
    auto t4 = std::chrono::high_resolution_clock::now();
    std::cout
        << "usingMap() took "
        << std::chrono::duration_cast<std::chrono::milliseconds>(t4-t3).count()
        << " milliseconds\n";
    return 0;
}

Upvotes: 2

Ivaylo Strandjev
Ivaylo Strandjev

Reputation: 70929

If you are only using regular ascii characters, you can make the solution a bit faster - instead of using a map, use an array of size 256 and count the occurrences of the character with a given code 'x' in the array cell count[x]. This will remove an logarithm(256) from your solution and thus will make it a bit faster. I do not think much more can be done with respect to optimization of this algorithm.

Upvotes: 6

Related Questions