Surfero
Surfero

Reputation: 197

Trouble with dynamic arrays and string occurence (C++)

I am working on a lab for my C++ class. I have a very basic working version of my lab running, however it is not quite how it is supposed to be.

The assignment:

Write a program that reads in a text file one word at a time. Store a word into a dynamically created array when it is first encountered. Create a parallel integer array to hold a count of the number of times that each particular word appears in the text file. If the word appears in the text file multiple times, do not add it into your dynamic array, but make sure to increment the corresponding word frequency counter in the parallel integer array. Remove any trailing punctuation from all words before doing any comparisons.

Create and use the following text file containing a quote from Bill Cosby to test your program.

I don't know the key to success, but the key to failure is trying to please everybody.

At the end of your program, generate a report that prints the contents of your two arrays in a format similar to the following:

Word Frequency Analysis

Word Frequency I 1 don't 1 know 1 the 2 key 2 ...

I can figure out if a word repeats more than once in the array, but I cannot figure out how to not add/remove that repeated word to/from the array. For instance, the word "to" appears three times, but it should only appear in the output one time (meaning it is in one spot in the array).

My code:

using namespace std;

int main()
{
    ifstream file;
    file.open("Quote.txt");
    if (!file)
    {
        cout << "Error: Failed to open the file.";
    }

else
{
    string stringContents;
    int stringSize = 0;

    // find the number of words in the file
    while (file >> stringContents)
    {
        stringSize++;
    }

    // close and open the file to start from the beginning of the file
    file.close();
    file.open("Quote.txt");

    // create dynamic string arrays to hold the contents of the file
    // these will be used to compare with each other the frequency
    // of the words in the file
    string *mainContents = new string[stringSize];
    string *compareContents = new string[stringSize];

    // holds the frequency of each word found in the file
    int frequency[stringSize];

    // initialize frequency array
    for (int i = 0; i < stringSize; i++)
    {
        frequency[i] = 0;
    }

    stringContents = "";

    cout << "Word\t\tFrequency\n";
    for (int i = 0; i < stringSize; i++)
    {
        // if at the beginning of the iteration
        // don't check for the reoccurence of the same string in the array
        if (i == 0)
        {
            file >> stringContents;

            // convert the current word to a c-string
            // so we can remove any trailing punctuation
            int wordLength = stringContents.length() + 1;
            char *word = new char[wordLength];
            strcpy(word, stringContents.c_str());

            // set this to no value so that if the word has punctuation
            // needed to remove, we can modify this string
            stringContents = "";

            // remove punctuation except for apostrophes
            for (int j = 0; j < wordLength; j++)
            {
                if (ispunct(word[j]) && word[j] != '\'')
                {
                    word[j] = '\0';
                }

                stringContents += word[j];
            }

            mainContents[i] = stringContents;
            compareContents[i] = stringContents;
            frequency[i] += 1;
        }

        else
        {
            file >> stringContents;
            int wordLength = stringContents.length() + 1;
            char *word = new char[wordLength];
            strcpy(word, stringContents.c_str());

            // set this to no value so that if the word has punctuation
            // needed to remove, we can modify this string
            stringContents = "";

            for (int j = 0; j < wordLength; j++)
            {
                if (ispunct(word[j]) && word[j] != '\'')
                {
                    word[j] = '\0';
                }

                stringContents += word[j];
            }

            // stringContents = "dont";
            //mainContents[i] = stringContents;
            compareContents[i] = stringContents;

            // search for reoccurence of the word in the array
            // if the array already contains the word
            // don't add the word to our main array
            // this is where I am having difficulty
            for (int j = 0; j < stringSize; j++)
            {
                if (compareContents[i].compare(compareContents[j]) == 0)
                {
                    frequency[i] += 1;
                }

                else
                {
                    mainContents[i] = stringContents;
                }
            }
        }

        cout << mainContents[i] << "\t\t" << frequency[i];
        cout << "\n";
    }

}

file.close();

return 0;

}

I apologize if the code is difficult to understand/follow through. Any feedback is appreciated :]

Upvotes: 1

Views: 845

Answers (3)

Sridhar Nagarajan
Sridhar Nagarajan

Reputation: 1105

If you use stl, the entire problem can be solved easily, with less coding.

#include <iostream>
#include <fstream>
#include <string>
#include <unordered_map>
#include <algorithm>

using namespace std;

int main()
{
    ifstream file("Quote.txt");
    string aword;
    unordered_map<string,int> wordFreq;
    if (!file.good()) {
        cout << "Error: Failed to open the file.";
        return 1;
    }
    else {
        while( file >> aword ) {
            aword.erase(remove_if(aword.begin (), aword.end (), ::ispunct), aword.end ()); //Remove Punctuations from string
            unordered_map<string,int>::iterator got = wordFreq.find(aword);
            if ( got == wordFreq.end() )
              wordFreq.insert(std::make_pair<string,int>(aword.c_str(),1)); //insert the unique strings with default freq 1
            else
              got->second++; //found - increment freq
         }
    }
    file.close();

    cout << "\tWord Frequency Analyser\n"<<endl;
    cout << "     Frequency\t    Unique Words"<<endl;
    unordered_map<string,int>::iterator it;
    for ( it = wordFreq.begin(); it != wordFreq.end(); ++it )
      cout << "\t" << it->second << "\t\t" << it->first << endl;

    return 0;
}

Upvotes: 1

brads3290
brads3290

Reputation: 2075

Depending on whether or not your assignment requires that you use an 'array', per se, you could consider using a std::vector or even a System::Collections::Generic::List for C++/CLI.

Using vectors, your code might look something like this:

#include <vector>
#include <string>
#include <fstream>
#include <iostream>

using namespace std;

int wordIndex(string);      //Protoype a function to check if the vector contains the word
void processWord(string);   //Prototype a function to handle each word found

vector<string> wordList;    //The dynamic word list
vector<int> wordCount;      //The dynamic word count

void main() {
    ifstream file("Quote.txt");
    if (!file) {
        cout << "Error: Failed to read file" << endl;
    } else {
        //Read each word into the 'word' variable
        string word;
        while (!file.eof()) {
            file >> word;
            //Algorithm to remove punctuation here
            processWord(word);
        }
    }

    //Write the output to the console
    for (int i = 0, j = wordList.size(); i < j; i++) {
        cout << wordList[i] << ": " << wordCount[i] << endl;
    }

    system("pause");
    return;
}

void processWord(string word) {
    int index = wordIndex(word);    //Get the index of the word in the vector - if the word isn't in the vector yet, the function returns -1.
                                    //This serves a double purpose: Check if the word exsists in the vector, and if it does, what it's index is.
    if (index > -1) {
        wordCount[index]++;         //If the word exists, increment it's word count in the parallel vector.
    } else {
        wordList.push_back(word);   //If not, add a new entry
        wordCount.push_back(1);     //in both vectors.
    }
}

int wordIndex(string word) {
    //Iterate through the word list vector
    for (int i = 0, j = wordList.size(); i < j; i++) {
        if (wordList[i] == word) {
            return i;               //The word has been found. return it's index.
        }
    }
    return -1;                      //The word is not in the vector. Return -1 to tell the program that the word hasn't been added yet.
}

I've tried to annotate any new code/concepts with comments to make it easy to understand, so hopefully you can find it useful.

As a side note, you may notice that I've moved a lot of the repetative code out of the main function and into other functions. This allows for more efficient and readable coding because you can divide each problem into easily manageable, smaller problems.

Hope this can be of some use.

Upvotes: 0

Christophe
Christophe

Reputation: 73366

The algorithm that you use is very complex for such a simple task. Here is what you sahll do:

  1. Ok, first reading pass for determining the maximum size of the array
  2. Then second reading pass, look directly at what to do: if string is already in the table just increment its frequency, otherwise add it to the table.
  3. Output the table

The else block of your code would then look like:

    string stringContents;
    int stringSize = 0;

    // find the number of words in the file
    while (file >> stringContents)
        stringSize++;

    // close and open the file to start from the beginning of the file
    file.close();
    file.open("Quote.txt");

    string *mainContents = new string[stringSize];   // dynamic array for strings found
    int *frequency = new int[stringSize];           // dynamic array for frequency
    int uniqueFound = 0;                            // no unique string found

    for (int i = 0; i < stringSize && (file >> stringContents); i++)
    {
        //remove trailing punctuations 
        while (stringContents.size() && ispunct(stringContents.back()))
            stringContents.pop_back();

        // process string found 
        bool found = false;
        for (int j = 0; j < uniqueFound; j++)
            if (mainContents[j] == stringContents) {  // if string already exist
                frequency[j] ++;     // increment frequency 
                found = true;
            }
        if (!found) {   // if string not found, add it !  
            mainContents[uniqueFound] = stringContents;
            frequency[uniqueFound++] = 1;   // and increment number of found
        }
    }
    // display results
    cout << "Word\t\tFrequency\n";
    for (int i=0; i<uniqueFound; i++)
        cout << mainContents[i] << "\t\t" << frequency[i] <<endl;
}

Ok, it's an assignment. So you have to use arrays. Later you could sumamrize this code into:

    string stringContents;
    map<string, int> frequency; 

    while (file >> stringContents) {
        while (stringContents.size() && ispunct(stringContents.back()))
            stringContents.pop_back();
        frequency[stringContents]++;
    }
    cout << "Word\t\tFrequency\n";
    for (auto w:frequency) 
        cout << w.first << "\t\t" << w.second << endl;

and even have the words sorted alphabetically.

Upvotes: 0

Related Questions