Reputation: 63
I need to write code to store the unique characters and their frequencies in a dynamic array. I need to increase its size as new data comes in. New data in this case will be a new character that is encountered. The algorithm I have in mind is to check the list of known characters every single time I read from the given string. If it is a new character I need to increase the array size by 1. If it is not a new character I will increase its frequency. It is an array of struct letter (in the code below). The problem is that, I spent quite a lot of time with this and had issues with implementing it. So the question is how can I exactly implement it? Thank you spending time to help.
#include <iostream>
#include <string>
#include <bitset>
#define ARR_LEN(arr) sizeof(arr)/sizeof(arr[0])
using namespace std;
struct unique_char {
char character;
int frequency;
};
int main() {
int char_count;
string str;
getline(cin, str);
struct unique_char* chars = new struct unique_char[100];
system("PAUSE");
exit(0);
}
Upvotes: 0
Views: 1652
Reputation: 57678
Using an array makes things straight forward:
unsigned int frequencies[256] = {0};
while (std::getline(std::cin, str))
{
const size_t length = str.length();
for (unsigned int i = 0; i < length; ++i)
{
const char c = str[i];
++frequencies[c];
}
}
Although, you may want to improve efficiency:
const size_t BUFFER_SIZE = 1024u * 1024u;
//...
char buffer[BUFFER_SIZE] = {0};
while (std::cin.read(&buffer[0], BUFFER_SIZE)
{
const size_t chars_read = cin.gcount();
for (unsigned int i = 0; i < chars_read; ++i)
{
const char c = buffer[i];
++frequencies[c];
}
}
The above code uses block reading to improve input performance. No scanning for newline characters, just read straight into memory. Determine the frequencies from the characters in memory.
Edit 1: unsigned char
From the comments, an unsigned char
may be a safer data type than char
because char
can be signed. This may be an issue when accessing the array slots because a signed char
could be negative and negative indices are usually a bad thing. When you run it, if there are issues, replace the char
type with unsigned char
.
Upvotes: 2
Reputation:
As mentionned in the comments, using std::map
makes this fairly straightforward.
One of the "fun" things about map is that the indexing operator creates new values "on demand" with a initial value of 0
for ints. So the actual code is essentially one line: chars[c] += 1;
#include <map>
#include <iostream>
#include <string>
using namespace std;
int main() {
map<char, int> chars;
string str;
getline(cin, str);
for(char c: str) {
chars[c] += 1;
}
for(auto [character, frequency]: chars) {
cout << character << " : " << frequency << "\n";
}
}
N.B. There is one major difference between this and @ThomasMatthews's answer:
The map will only contain the characters that have been seen, whereas the array will contain 0
s for all characters that were never hit. Which approach you use should be based on which of the two are more useful to you.
Upvotes: 2