dekaottoman
dekaottoman

Reputation: 63

C++ - How do I frequency count characters?

I need to write code to store the unique characters and their frequencies in a dynamic array. I need to increase its size as new data comes in. New data in this case will be a new character that is encountered. The algorithm I have in mind is to check the list of known characters every single time I read from the given string. If it is a new character I need to increase the array size by 1. If it is not a new character I will increase its frequency. It is an array of struct letter (in the code below). The problem is that, I spent quite a lot of time with this and had issues with implementing it. So the question is how can I exactly implement it? Thank you spending time to help.

#include <iostream>
#include <string>
#include <bitset>
#define ARR_LEN(arr) sizeof(arr)/sizeof(arr[0])

using namespace std;

struct unique_char {
    char character;
    int frequency;
};

int main() {
    int char_count;
    string str;
    getline(cin, str);

    struct unique_char* chars = new struct unique_char[100];

    system("PAUSE");
    exit(0);
}

Upvotes: 0

Views: 1652

Answers (2)

Thomas Matthews
Thomas Matthews

Reputation: 57678

Using an array makes things straight forward:

unsigned int frequencies[256] = {0};
while (std::getline(std::cin, str))
{
   const size_t length = str.length();
   for (unsigned int i = 0; i < length; ++i)
   {
       const char c = str[i];
       ++frequencies[c];
   }
}

Although, you may want to improve efficiency:

const size_t BUFFER_SIZE = 1024u * 1024u;
//...
char buffer[BUFFER_SIZE] = {0};
while (std::cin.read(&buffer[0], BUFFER_SIZE)
{
    const size_t chars_read = cin.gcount();
    for (unsigned int i = 0; i < chars_read; ++i)
    {
         const char c = buffer[i];
         ++frequencies[c];
    }
}

The above code uses block reading to improve input performance. No scanning for newline characters, just read straight into memory. Determine the frequencies from the characters in memory.

Edit 1: unsigned char
From the comments, an unsigned char may be a safer data type than char because char can be signed. This may be an issue when accessing the array slots because a signed char could be negative and negative indices are usually a bad thing. When you run it, if there are issues, replace the char type with unsigned char.

Upvotes: 2

user4442671
user4442671

Reputation:

As mentionned in the comments, using std::map makes this fairly straightforward.

One of the "fun" things about map is that the indexing operator creates new values "on demand" with a initial value of 0 for ints. So the actual code is essentially one line: chars[c] += 1;

#include <map>
#include <iostream>
#include <string>

using namespace std;

int main() {
  map<char, int> chars;

  string str;
  getline(cin, str);

  for(char c: str) {
    chars[c] += 1;
  }

  for(auto [character, frequency]: chars) {
      cout << character << " : " << frequency << "\n";
  }
}

N.B. There is one major difference between this and @ThomasMatthews's answer:

The map will only contain the characters that have been seen, whereas the array will contain 0s for all characters that were never hit. Which approach you use should be based on which of the two are more useful to you.

Upvotes: 2

Related Questions