Amber Roxanna
Amber Roxanna

Reputation: 1695

Reading data from a file with wrong count. What's best practice for reading in data?

I have four sets of text files each containing different words.

noun.txt has 7 words Article.txt has 5 words verb.txt has 6 words and Preposition.txt has 5 words

In the code below, inside my second for loop, an array of counts keeps track of how many words i've read in and from what file. so for example. count[0] should be 5 worlds which it is, but count[1] has 8 words but should be 7. I went back to check the text file and i didn't make a mistake, it has 7 words. Is this a problem with how ifstream is behaving ?

I've also been told eof() is not good practice. What's best practice in industry in terms of reading in data accurately ? In other words is there something better i can use besides !infile.eof() ?

#include <cstdlib>
#include <iostream>
#include <fstream>
#include <cctype>
#include <array> // std::array

using namespace std;

const int MAX_WORDS = 100;

class Cwords{
    public:
        std::array<string,4> partsOfSpeech;
};

int main()
{
    Cwords elements[MAX_WORDS];

   int count[4] = {0,0,0,0};

   ifstream infile;

    string file[4] = {"Article.txt",
                      "Noun.txt",
                      "Preposition.txt",
                      "verb.txt"};

    for(int i = 0; i < 4; i++){
        infile.open(file[i]);
        if(!infile.is_open()){
            cout << "ERROR: Unable to open file!\n";
            system("PAUSE");
            exit(1);
        }

        for(int j = 0;!infile.eof();j++){
            infile >> elements[j].partsOfSpeech[i];
            count[i]++;
        }

        infile.close();
    }

    ofstream outfile;
    outfile.open("paper.txt");

    if(!outfile.is_open()){
        cout << "ERROR: Unable to open or create file.\n";
        system("PAUSE");
        exit(1);
    }



    outfile.close();
    system("PAUSE");
    return 0;
}

Upvotes: 0

Views: 316

Answers (3)

Dietmar K&#252;hl
Dietmar K&#252;hl

Reputation: 153840

The simple answer to reading data properly is this: always test after reading that the read operation was successful. This test does not involve the use of eof() (any book teaching the use of eof() prior to reading is worthy to be burnt immediately).

The main loop for reading the file should look something like this:

for (int j = 0; infile >> elements[j].partsOfSpeach[i]; ++j){
    ++count[i];
}

BTW, although the language is called "C++" and not "++C", don't use post increment unless you actually do use the result of the expression: in most cases it doesn't matter but sometimes it does matter and then post-increment can be significant slower than pre-increment.

Upvotes: 3

Kai Petzke
Kai Petzke

Reputation: 2954

Likely you have an empty line at the end of the file, that looks "empty". My recommendation is to use code like the following:

#include <boost/algorithm/string.hpp>
#include <string>

...

    std::string line;
    int cnt = 0;
    while(! infile.eof()) {
        infile >> line;
        boost::algorithm::trim(line);
        if(line.size > 0)
            words[filenr][cnt++] = line;
    }

Note, that I strongly recommend to have an "outer" object, that is indexed by the type of list (like 0 for Article.txt and 1 for Noun.txt), and the "inner" object be a vector, that takes the words. Your implementation is the other way round, which is sub-optimal, because you have to carry around empty slots in your partsOfSpeech vector in your implementation. Also note, that in your example, setting a hard upper limit of "100" for the number of words for each file is very dangerous - it can cause buffer overrun! Better use std::vector for the actual word lists, as vectors auto-expand easily.

Upvotes: 0

Lochemage
Lochemage

Reputation: 3974

Have you checked to make sure there aren't any extra spaces or newline's at the end of your text file? It may be possible that your last extra 'word' is due to trailing characters before the eof is reached.

Upvotes: 0

Related Questions