efan
efan

Reputation: 99

Count number of letters in a string from vector C++

I'm having some trouble figuring this problem and I was wondering if anyone could tell me what's going wrong. Essentially, I need to take however many strings (until the console stops), push them to a vector, take all the strings from said vector and turn them into one big string, then count the number of each specific character in that string.

For example, the inputs "Hello" "hello" "Stack" "stAck" would output

a: 2

c: 2

e: 2

h: 2

k: 2

l: 4

o: 2

s: 2

t: 2

Here's what I have so far. I've managed to pass all the strings from the vector into one big string, but when I try and count the number of letters in the string something goes wrong and I'm not sure what.

#include <iostream>
#include <vector>
#include <numeric>
using namespace std;

int numA(string s){
    int count = 0;
    for(int i = 0; i < s.size(); i++){
        if(s[i] == 'a' || s[i] == 'A'){
            count++;
        }
    }
    return count;
}

int numB(string s){
    int count = 0;
    for(int i = 0; i < s.size(); i++){
        if(s[i] == 'b' || s[i] == 'B'){
            count++;
        }
    }
    return count;
}

int main(){

    vector<string> words;
    string str;

    while(cin >> str){
        cin >> str;
        words.push_back(str);
    }
    str = accumulate(begin(words), end(words), str);


    cout << str << endl;
    cout << "a: " << numA(str) << endl;
    cout << "b: " << numB(str) << endl;
}

Right now I'm only testing for the amount of A's, a's, B's and b's, but for some reason, sometimes it passes out the right number but other times it'll output a few less than it's supposed to, or sometimes none at all, even if there's clearly an a or b in it.

I have a feeling it's something to do with the accumulate, but I'm not sure otherwise. If anyone can explain to me what could possibly be going wrong I'd really appreciate the help. Thanks!

Upvotes: 0

Views: 2809

Answers (1)

David C. Rankin
David C. Rankin

Reputation: 84569

There are a couple of issues you are missing. The primary one is that whenever you need to know the frequency within which any number of objects occur, you simply need a Frequency Array that contains the number of elements (one for each object in the range you need to measure), with each element initially set to zero. When you are simply looking at character frequency, a simple array of 26 int (one for each character in the alphabet) is all that is needed. The ASCII value allows a simple mapping from character to array-index.

If you have a more complicated type that doesn't provide a simple way of mapping value => index, then you can use something that allows storing that number of pairs, like a std::unordered_set or similar.

Here mapping characters ASCII Table and Description provides a simple way to map characters to array-index. Additionally, since you are ignoring case, just convert all characters to either upper or lower before mapping the index.

Putting it altogether, you could do something similar to:

#include <iostream>
#include <string>
#include <vector>
#include <cctype>

#define NCHAR 26

int main () {
    
    std::string s {};                       /* string */
    std::vector<std::string> vs {};         /* vector of strings */
    int lower[NCHAR] = {0};                 /* frequency array - initialized all zero */
    
    while (std::cin >> s)                   /* read each word */
        vs.push_back(s);                    /* add to vector */
    
    if (vs.size() == 0)                     /* validate at least 1 word stored */
        return 1;
    
    for (const auto& w : vs)                /* for each word in vector of strings */
        for (const auto& c : w)             /* for each char in word */
            if (isalpha(c))                 /* if [a-zA-Z] */
                lower[tolower(c)-'a']++;    /* convert tolower, increment index */
        
    for (int i = 0; i < NCHAR; i++)         /* loop over frequency array */
        if (lower[i])                       /* if element not zero, output result */
            std::cout << (char)(i+'a') << ": " << lower[i] << '\n';
}

Example Use/Output

Using a simple heredoc to feed the words in your example to the program would result in the following:

$ ./bin/vectstr_frequency << 'eof'
> Hello
> hello
> Stack
> stAck
> eof
a: 2
c: 2
e: 2
h: 2
k: 2
l: 4
o: 2
s: 2
t: 2

The frequency array is a frequently used mapping to determine the occurrences of whatever set of objects you may be dealing with. It may take a bit to digest how mapping a value to an array index and then incrementing the element at that index produces the frequency of occurrences -- but the light-bulb will wink on and it will make perfect sense.

(based on your comments, and to ensure the responses are not wiped on a comment cleanup by Sam, the following additional information is provided as part of the answer)

Frequency Array Explained

To help wrap your head around how the frequency array is populated and how values are mapped to indexes and then indexes mapped back to values, take char ch; to hold the current character. If ch == 'd'; Then ch - 'a' maps the current character 'd' to the index 3. How? The ASCII Table, 'd' - 'a' is 100 - 97 which is 3. (so your mapping simply subtracts the value for the first object in the range from the current value, ensuring the first object in your range maps to the first element of your frequency array)

When outputting the results, for example for (int i = 0; i < NCHAR; i++), i represents the frequency array index which will be 0, 1, 2, ... So to map in the opposite direction, the character represented by the index will be i + 'a'. E.g., when the index is 3, you have 3 + 97 which is 100 which corresponds to the ASCII character 'd'. So you subtract 'a' to map to index, and add that same offset 'a' (97) to the index to map the index back to the character (value in your range).

If you look, you are just subtracting whatever is needed to map the first value in the range back to zero (the first index in your frequency array), and then going back from index to value, you are just adding that same offset to the index.

To Determine the Frequency of Characters ' ' to '/'

You have 16 continual values in your range, the ASCII characters from ' ' (32) to '/' (47). so you need a frequency array with 16-elements, e.g. int punct[16] = {0}; to measure the occurrences of the punctuation in that range. Then you can test if (' ' <= c && c <= '/') { /* process the punctuation */ } where you would map the character to index with c - ' ' and then you would map the index back to character with i + ' '. Works the same way for any consecutive range of objects.

But note, in this case you can't use std::cin >> s to read each string, you must use getline(std::cin, s) to read a line-at-a-time. Why? Your new range includes spaces and >> stops reading when it encounters a space. So using std::cin >> s;, you will never read any spaces.

Changing input to use getline() and adding the punct[] frequency array, you could do:

#include <iostream>
#include <string>
#include <vector>
#include <cctype>

#define NPUNC 16        /* if you need a constant, #define one (or more) */
#define NCHAR 26

int main () {
    
    std::string s {};                       /* string */
    std::vector<std::string> vs {};         /* vector of strings */
    int lower[NCHAR] = {0},                 /* frequency array - initialized all zero */
        punct[NPUNC] = {0};                 /* frequency array - for ' ' - '/' */
    
    while (getline (std::cin, s))           /* read each line */
        vs.push_back(s);                    /* add to vector */
    
    if (vs.size() == 0)                     /* validate at least 1 line stored */
        return 1;
    
    for (const auto& l : vs)                /* for each line in vector of strings */
        for (const auto& c : l)             /* for each char in line */
            if (isalpha(c))                 /* if [a-zA-Z] */
                lower[tolower(c)-'a']++;    /* convert tolower, increment index */
            else if (' ' <= c && c <= '/')  /* if ' ' through '/' */
                punct[c-' ']++;             /* increment corresponding punct index */
    
    for (int i = 0; i < NPUNC; i++)         /* loop over punct frequency array */
        if (punct[i])                       /* if element not zero, output result */
            std::cout << (char)(i+' ') << ": " << punct[i] << '\n';
    
    std::cout.put('\n');
    
    for (int i = 0; i < NCHAR; i++)         /* loop over frequency array */
        if (lower[i])                       /* if element not zero, output result */
            std::cout << (char)(i+'a') << ": " << lower[i] << '\n';
}

Example Use/Output

$ echo 'My *#$/#%%!! Dog Has Fleas!!' | ./bin/vectstr_frequency+punct
 : 4
!: 4
#: 2
$: 1
%: 2
*: 1
/: 1

a: 2
d: 1
e: 1
f: 1
g: 1
h: 1
l: 1
m: 1
o: 1
s: 2
y: 1

(note: if you use printf in bash instead of echo, then "%%" will be counted as a single '%' -- make sure you understand why (man 3 printf will do))

Look things over and let me know if you have further questions.

Upvotes: 2

Related Questions