Abhishek Mishra
Abhishek Mishra

Reputation: 5040

Finding Frequency of numbers in a given group of numbers

Suppose we have a vector/array in C++ and we wish to count which of these N elements has maximum repetitive occurrences and output the highest count. Which algorithm is best suited for this job.

example:

int a = { 2, 456, 34, 3456, 2, 435, 2, 456, 2}

the output is 4 because 2 occurs 4 times. That is the maximum number of times 2 occurs.

Upvotes: 8

Views: 7233

Answers (10)

A M
A M

Reputation: 15265

Now, in the year 2022 we have

  • namespace aliases
  • more modern containers like std::unordered_map
  • CTAD (Class Template Argument Deduction)
  • range based for loops
  • using statment
  • the std::ranges library
  • more modern algorithms
  • projections
  • structured bindings

With that we can now write:

#include <iostream>
#include <vector>
#include <unordered_map>
#include <algorithm>

namespace rng = std::ranges;

int main() {
    // Demo data
    std::vector data{ 2, 456, 34, 3456, 2, 435, 2, 456, 2 };

    // Count values
    using Counter = std::unordered_map<decltype (data)::value_type, std::size_t> ;

    Counter counter{}; for (const auto& d : data) counter[d]++;

    // Get max
    const auto& [value, count] = *rng::max_element(counter, {}, &Counter::value_type::second);

    // Show output
    std::cout << '\n' << value << " found " << count << " times\n";
}

Upvotes: 0

kowsalya
kowsalya

Reputation:

It wil be in O(n)............ but the thing is the large no. of array can take another array with same size............

for(i=0;i

mar=count[o]; index=o;

for(i=0;i

then the output will be......... the element index is occured for max no. of times in this array........

here a[] is the data array where we need to search the max occurance of certain no. in an array.......

count[] having the count of each element.......... Note : we alrdy knw the range of datas will be in array.. say for eg. the datas in that array ranges from 1 to 100....... then have the count array of 100 elements to keep track, if its occured increament the indexed value by one........

Upvotes: 0

Alastair
Alastair

Reputation: 4523

Here's my complete, tested, version, using a std::tr1::unordered_map.

I make this approximately O(n). Firstly it iterates through the n input values to insert/update the counts in the unordered_map, then it does a partial_sort_copy which is O(n). 2*O(n) ~= O(n).

#include <unordered_map>
#include <vector>
#include <algorithm>
#include <iostream>

namespace {
// Only used in most_frequent but can't be a local class because of the member template
struct second_greater {
    // Need to compare two (slightly) different types of pairs
    template <typename PairA, typename PairB>
    bool operator() (const PairA& a, const PairB& b) const
        { return a.second > b.second; }
};
}

template <typename Iter>
std::pair<typename std::iterator_traits<Iter>::value_type, unsigned int>
most_frequent(Iter begin, Iter end)
{
    typedef typename std::iterator_traits<Iter>::value_type value_type;
    typedef std::pair<value_type, unsigned int> result_type;

    std::tr1::unordered_map<value_type, unsigned int> counts;

    for(; begin != end; ++begin)
        // This is safe because new entries in the map are defined to be initialized to 0 for
        // built-in numeric types - no need to initialize them first
        ++ counts[*begin];

    // Only need the top one at this point (could easily expand to top-n)
    std::vector<result_type> top(1);

    std::partial_sort_copy(counts.begin(), counts.end(),
                           top.begin(), top.end(), second_greater());

    return top.front();
}

int main(int argc, char* argv[])
{
    int a[] = { 2, 456, 34, 3456, 2, 435, 2, 456, 2 };

    std::pair<int, unsigned int> m = most_frequent(a, a + (sizeof(a) / sizeof(a[0])));

    std::cout << "most common = " << m.first << " (" << m.second << " instances)" << std::endl;
    assert(m.first == 2);
    assert(m.second == 4);

    return 0;
}

Upvotes: 0

Tyler
Tyler

Reputation: 28874

The hash algorithm (build count[i] = #occurrences(i) in basically linear time) is very practical, but is theoretically not strictly O(n) because there could be hash collisions during the process.

An interesting special case of this question is the majority algorithm, where you want to find an element which is present in at least n/2 of the array entries, if any such element exists.

Here is a quick explanation, and a more detailed explanation of how to do this in linear time, without any sort of hash trickiness.

Upvotes: 1

mjcohenw
mjcohenw

Reputation:

If the range of elements is large compared with the number of elements, I would, as others have said, just sort and scan. This is time n*log n and no additional space (maybe log n additional).

THe problem with the counting sort is that, if the range of values is large, it can take more time to initialize the count array than to sort.

Upvotes: 0

Nicola Bonelli
Nicola Bonelli

Reputation: 8287

A possible C++ implementation that makes use of STL could be:

#include <iostream>
#include <algorithm>
#include <map>

// functor
struct maxoccur
{
    int _M_val;
    int _M_rep;

    maxoccur()
    : _M_val(0),
      _M_rep(0)
    {}

    void operator()(const std::pair<int,int> &e)
    {
        std::cout << "pair: " << e.first << " " << e.second << std::endl;
        if ( _M_rep < e.second ) {
            _M_val = e.first;
            _M_rep = e.second;
        }
    }
};

int
main(int argc, char *argv[])
{
    int a[] = {2,456,34,3456,2,435,2,456,2};
    std::map<int,int> m; 

    // load the map
    for(unsigned int i=0; i< sizeof(a)/sizeof(a[0]); i++) 
        m [a[i]]++;

    // find the max occurence...
    maxoccur ret = std::for_each(m.begin(), m.end(), maxoccur());
    std::cout << "value:" << ret._M_val << " max repetition:" << ret._M_rep <<  std::endl;

    return 0;
}

Upvotes: 2

Thorsten79
Thorsten79

Reputation: 10128

If you have the RAM and your values are not too large, use counting sort.

Upvotes: 4

Sklivvz
Sklivvz

Reputation: 31143

Optimized for space:

Quicksort (for example) then iterate over the items, keeping track of largest count only. At best O(N log N).

Optimized for speed:

Iterate over all elements, keeping track of the separate counts. This algorithm will always be O(n).

Upvotes: 10

Franci Penov
Franci Penov

Reputation: 76001

Sort the array and then do a quick pass to count each number. The algorithm has O(N*logN) complexity.

Alternatively, create a hash table, using the number as the key. Store in the hashtable a counter for each element you've keyed. You'll be able to count all elements in one pass; however, the complexity of the algorithm now depends on the complexity of your hasing function.

Upvotes: 18

UnkwnTech
UnkwnTech

Reputation: 90911

a bit of pseudo-code:

//split string into array firts
strsplit(numbers) //PHP function name to split a string into it's components
i=0
while( i < count(array))
 {
   if(isset(list[array[i]]))
    {
      list[array[i]]['count'] = list + 1
    }
   else
    {
      list[i]['count'] = 1
      list[i]['number']
    }
   i=i+1
 }
usort(list) //usort is a php function that sorts an array by its value not its key, Im assuming that you have something in c++ that does this
print list[0]['number'] //Should contain the most used number

Upvotes: 1

Related Questions