Neosapien
Neosapien

Reputation: 177

What is the fastest way to get the frequency of numbers in an array in C++?

My method creates an std::map<int, int> and populates it with the number and its frequency by iterating over the array once, but I'm wondering if there's a quicker way without using a map.

Upvotes: 0

Views: 1084

Answers (3)

Max Cury
Max Cury

Reputation: 55

I think this should be a faster way to do it:

unordered_map<uint32_t, uint32_t> freqs(array_size);
for (uint32_t number : array) freqs[number] += 1;

Of cause, everything should be measured. But my assumption is that std::unordered_map can't be slower than std::map in this case, because hash function for an integer number should be just static_cast<size_t>(number), which is O(1). Though maybe another type of container can make it even faster, for example a flat hash map. But I can't say anything about it. Micro optimizations here are as follows:

  • cycle with uint32_t can be a little bit faster because it is 4 bytes only, whereas a pointer/reference on a 64x architecture takes 8 bytes

  • += won't generate asm code that will return the accessed value even if you don't turn on /O2 compiler flag for optimization

  • reserve array_size buckets in the hash table, that will lessen the amount of rehashes

Upvotes: 0

gxu
gxu

Reputation: 15

NOTE: ValueType, DifferenceType are defined to be

template <std::input_iterator I>
using ValueType = typename std::iterator_traits<I>::value_type;

template <std::input_iterator I>
using
DifferenceType = typename std::iterator_traits<I>::difference_type;

If the array is sorted, you can use std::equal_range to find the range of elements that is equal to x. With concepts you write:

// I2 is homomorphic to std::pair<I, unsigned>
// [first, last) is partially ordered with respect to I::value_type
// return value is d_first + |{x | x in [first, last)}|
// R is a relation over I, compare element using R
template <std::random_access_iterator I, std::forward_iterator I2,
std::relation<bool, ValueType<I>> R = std::less<ValueType<I>>>
requires(std::regular<ValueType<I>> &&
std::is_constructible_v<ValueType<I2>, I, DifferenceType<I>>)

I2 frequency_sorted(I first, I last, I2 d_first, R r = R())
{
  while(first != last)
  {
    auto [left, right] = std::equal_range(first, last, *first, r);
    *d_first = {left, std::distance(left, right)};
    ++d_first;
    first = right;
  }
  return d_first;
}

If you have limited resources, you can truncate the result and have:

// I2 is homomorphic to std::pair<I, unsigned>
// [first, last) is partially ordered with respect to I::value_type
// return value is a pair, where the first element is 
// the starting point of subsequence [first, last) where such
// subsequence is unevaluated
// the second element is 
// - d_last if |{x | x in [first, last)}| >= d_last - d_first
// - d_first + |{x | x in [first, last)}| if otherwise
template <std::random_access_iterator I, std::forward_iterator I2,
std::relation<bool, ValueType<I>> R = std::less<ValueType<I>>>
requires(std::regular<ValueType<I>> &&
std::is_constructible_v<ValueType<I2>, I, DifferenceType<I>>)

std::pair<I, I2>
frequency_sorted_truncate(I first, I last, I2 d_first, I2 d_last, R r = R())
{
  while(first != last && d_first != d_last)
  {
    auto [left, right] = std::equal_range(first, last, *first, r);
    *d_first = {left, std::distance(left, right)};
    ++d_first;
    first = right;
  }
  return {first, d_first};
}

These two functions allow you to pass in any relation, and the default comparison uses operator<.

If your array is unsorted, and the size of the array is large enough, then it might be a good idea to just sort the array and use the algorithm. Hashing might be tempting but it creates cache miss and might not be as fast as you would expect. You can try both methods and measure which one is faster, you are welcome to tell me the result.

My compiler version is g++ 11.2.11, I think the code can be compiled with a C++ 20 compiler. If you don't have one, simply replace the concepts part with typename, I think by doing that you will only need a C++ 17 compiler(due to structural binding).

Please tell me whether my code can be improved.

Upvotes: 1

463035818_is_not_an_ai
463035818_is_not_an_ai

Reputation: 123114

std::unordered_map<int,int> can count frequencies as well but its operator[] has complexity (cppreference):

Average case: constant, worst case: linear in size.

Compared to

Logarithmic in the size of the container.

with a std::map.

When the maximum number is small you can use an array, and directly count:

 for (const auto& number : array) counter[number]++;

Admittetly, all this has already been said in comments, so I'll also add this one: You need to measure. Complexity is only about asymptotic runtime, while for given input size a std::map can actually be faster.

Upvotes: 6

Related Questions