Reputation: 55
I want to make a table of letter frequency, just like this
So, I have variable str
that contains some text, and I have map<char, int> m
, that contains number of occurrences of each letter.
So I have something like that:
vector<char> get_rank(const string& str)
{
map<char, int> m;
for (char i = 'a'; i <= 'z'; i++)
m[i] = 0;
for (char ch : str)
m[ch]++;
for (auto& it : m)
v.push_back(it.first);
return v;
}
But map is ordered by keys, not by values. Is there a way to "overload" the comparator of map to order each element by value? If the answer is no, how can I implement the same idea but with the same elegancy. Because every idea I have seems a little dull and rough.
Upvotes: 1
Views: 746
Reputation: 15277
I find the approach to use specialized containers very good.
For counting a std::map
or in your case a std::unordered_map
should be used. After that it can be copied into a std::vector
and sorted as you like. Example:
#include <iostream>
#include <string>
#include <utility>
#include <vector>
#include <unordered_map>
#include <algorithm>
// Function to get the rank of letters in a string
std::vector<std::pair<char,int>> getRank(const std::string& str) {
// Count letters
std::unordered_map<char, int> counter;
for (const char c : str) counter[c]++;
// Copy to vector and sort
std::vector<std::pair<char, int>> rank(counter.begin(), counter.end());
std::sort(rank.begin(), rank.end(), [](const std::pair<char, int>& p1, const std::pair<char, int>& p2) { return (p1.second == p2.second) ? p1.first<p2.first : p1.second>p2.second; });
return rank;
}
// Test / Driver Code
int main() {
std::string testString{ "122333444455555666666777777788888888" };
for (const auto& [letter, count] : getRank(testString))
std::cout << letter << ' ' << count << '\n';
}
This will work in a simple way.
Usage of a std::multiset
would make the code even more compact, because it will sort the values for you automatically:
#include <iostream>
#include <string>
#include <utility>
#include <set>
#include <unordered_map>
#include <type_traits>
// ------------------------------------------------------------
// Create aliases. Save typing work and make code more readable
using Pair = std::pair<const char, unsigned int>;
// Standard approach for counter
using Counter = std::unordered_map<std::remove_const<Pair::first_type>::type, Pair::second_type>;
// Sorted values will be stored in a multiset
struct Comp { bool operator ()(const Pair& p1, const Pair& p2) const { return (p1.second == p2.second) ? p1.first<p2.first : p1.second>p2.second; } };
using Rank = std::multiset<Pair, Comp>;
// ------------------------------------------------------------
// Function to get the rank of letters in a string
Rank getRank(const std::string& str) {
Counter counter;
for (const char c : str) counter[c]++;
return { counter.begin(), counter.end() };
}
// Test / Driver Code
int main() {
std::string testString{ "122333444455555666666777777788888888" };
for (const auto& [letter, count] : getRank(testString))
std::cout << letter << ' ' << count << '\n';
}
It is also possible to find the most frequent used element using a max heap.
Or, In the below example, we can get the x topmost elements. And this with just some few lines of code. The below will count nearly evrything and sortit according to the value:
#include <iostream>
#include <utility>
#include <unordered_map>
#include <algorithm>
#include <vector>
#include <iterator>
#include <type_traits>
// Helper for type trait We want to identify an iterable container ----------------------------------------------------
template <typename Container>
auto isIterableHelper(int) -> decltype (
std::begin(std::declval<Container&>()) != std::end(std::declval<Container&>()), // begin/end and operator !=
++std::declval<decltype(std::begin(std::declval<Container&>()))&>(), // operator ++
void(*std::begin(std::declval<Container&>())), // operator*
void(), // Handle potential operator ,
std::true_type{});
template <typename T>
std::false_type isIterableHelper(...);
// The type trait -----------------------------------------------------------------------------------------------------
template <typename Container>
using is_iterable = decltype(isIterableHelper<Container>(0));
// Some Alias names for later easier reading --------------------------------------------------------------------------
template <typename Container>
using ValueType = std::decay_t<decltype(*std::begin(std::declval<Container&>()))>;
template <typename Container>
using Pair = std::pair<ValueType<Container>, size_t>;
template <typename Container>
using Counter = std::unordered_map<ValueType<Container>, size_t>;
// Function to get the k most frequent elements used in any Container ------------------------------------------------
template <class Container>
auto topKFrequent(const Container& data, size_t k) {
if constexpr (is_iterable<Container>::value) {
// Count all occurences of data
Counter<Container> counter{};
for (const auto& d : data) counter[d]++;
// For storing the top k
std::vector<Pair<Container>> top(k);
// Get top k
std::partial_sort_copy(counter.begin(), counter.end(), top.begin(), top.end(),
[](const std::pair<int, size_t >& p1, const std::pair<int, size_t>& p2) { return p1.second > p2.second; });
return top;
}
else
return data;
}
int main() {
std::vector testVector{ 1,2,2,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,6,7 };
for (const auto& p : topKFrequent(testVector, 2)) std::cout << "Value: " << p.first << " \t Count: " << p.second << '\n';
std::cout << '\n';
double cStyleArray[] = { 1.1, 2.2, 2.2, 3.3, 3.3, 3.3 };
for (const auto& p : topKFrequent(cStyleArray, 2)) std::cout << "Value: " << p.first << " \t Count: " << p.second << '\n';
std::cout << '\n';
std::string s{ "abbcccddddeeeeeffffffggggggg" };
for (const auto& p : topKFrequent(s, 2)) std::cout << "Value: " << p.first << " \t Count: " << p.second << '\n';
std::cout << '\n';
double value = 12.34;
std::cout << topKFrequent(value, 2) << "\n";
return 0;
}
The above code is for C++17
And below is an example to find the most frequent element in any container.
But this is a C++20 solution:
#include <iostream>
#include <utility>
#include <unordered_map>
#include <queue>
#include <vector>
#include <iterator>
#include <type_traits>
#include <string>
// Some Alias names for later easier reading --------------------------------------------------------------------------
template <typename Container>
using ValueType = std::ranges::range_value_t<Container>;
template <typename Container>
using Pair = std::pair<ValueType<Container>, size_t>;
template <typename Container>
using Counter = std::unordered_map<ValueType<Container>, size_t>;
template <typename Container>
using UnderlyingContainer = std::vector<Pair<Container>>;
// Predicate Functor
template <class Container> struct LessForSecondOfPair {
bool operator ()(const Pair<Container>& p1, const Pair<Container>& p2) { return p1.second < p2.second; }
};
template <typename Container>
using MaxHeap = std::priority_queue<Pair<Container>, UnderlyingContainer<Container>, LessForSecondOfPair<Container>>;
// Calculate max element ---------------------------------------------------------------------------------------------
template <class Container>
auto topFrequent(const Container& data) {
if constexpr (std::ranges::range<Container>) {
// Count all occurences of data
Counter<Container> counter{};
for (const auto& d : data) counter[d]++;
// Build a Max-Heap
MaxHeap<Container> maxHeap(counter.begin(), counter.end());
// Return most frequent number
return maxHeap.top().first;
}
else
return data;
}
// Test
int main() {
std::vector testVector{ 1,2,2,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,6,7 };
std::cout << "Most frequent is: " << topFrequent(testVector) << "\n";
double cStyleArray[] = { 1.1, 2.2, 2.2, 3.3, 3.3, 3.3 };
std::cout << "Most frequent is: " << topFrequent(cStyleArray) << "\n";
std::string s{ "abbcccddddeeeeeffffffggggggg" };
std::cout << "Most frequent is: " << topFrequent(s) << "\n";
double value = 12.34;
std::cout << "Most frequent is: " << topFrequent(value) << "\n";
return 0;
}
Upvotes: 2
Reputation: 136
Instead of trying to do something with a STL container that is clearly not designed for this (ordering a map) maybe use a different container.
You can always use std::pair<char, int>
as the type for an std::vector
.
These can be sorted.
I'd prefer std::vector<std::pair<char,int>>
simply because I'm more used to working with vectors but that's up to you.
Upvotes: 1
Reputation: 122238
It is not possible to sort a map with respect to its mapped values. The order of elements of a map is managed by the map according to its comparator which only considers keys.
However, you do not need a map to count frequencies when the keys are contiguous and of limited range.
Instead of a map, use a std::vector<letter_and_count>
with
struct letter_and_count {
char letter;
int count = 0;
};
Initialize the vector with elements with letter
initialized to a
till z
then increment v[ i - 'a']
when you encounter the letter i
in the string. Then use std::sort
to sort the vector according to the count
member.
PS: Note comments below. The letters a
till z
are not necessarily contiguous. However (idea also from eerorika) you can use a simple lookup table:
std::string alphabet{"abcdefghijklmnopqrstuvwxyz"};
Then first find the letter in that alphabet
and use its index in the alphabet for the counting.
Upvotes: 3
Reputation: 238311
Is there a way to "overload" a comparator of map to order each element by value?
No.
If the answer is no, how can I implement the same idea but with the same elegancy.
Depends on use case. An elegant solution to the example is to sort the vector.
Upvotes: 2