gluon
gluon

Reputation: 61

read multiple lines but especially... parsing them efficiently

I need to read multiple lines with specific keywords at the beginning. I have a basic problem and I'd need a hand to help me.

Here are the kind of input:

keyword1 0.0 0.0
keyword1 1.0 5.0
keyword2 10.0
keyword3 0.5
keyword4 6.0

rules are:

Here's the source I have:

using namespace std;

int main (int argc, const char * argv[]) {    
    vector<double> arrayInputs;
    string line;
    double keyword1_first, keyword1_second, keyword4, 
          keyword3, keyword2;
    bool inside_keyword1=false, after_keyword2=false, 
          keyword4_defined=false, keyword3_defined=false ;

//cin.ignore();

 while (getline(cin, line)) {
     if (inside_keyword1 && after_keyword2 && keyword3 && keyword4) {
         break;
     }
     else
     {
         std::istringstream split(line);
         std::vector<std::string> tokens;
         char split_char = ' ';
         for (std::string each; std::getline(split, each, split_char); tokens.push_back(each));

         if (tokens.size() > 2)
         {
             if (tokens[0] != "keyword1") return EXIT_FAILURE; // input format error
             else
             {
                 keyword1_first = atof(tokens[1].c_str());
                 keyword1_second = atof(tokens[2].c_str());

                 inside_keyword1 = true;
             }
         }
         else
         {
             if (tokens[0] == "keyword2")
             {
                 if (inside_keyword1)
                 {
                     keyword2 = atof(tokens[1].c_str());
                     after_keyword2 = true;
                 }

                 else return EXIT_FAILURE; // cannot define anything else keyword2 after keyword1 definition

             }
             else if (tokens[0] == "keyword3")
             {
                 if (inside_keyword1 && after_keyword2)
                 {
                     keyword3 = atof(tokens[1].c_str());
                     keyword3_defined  = true;
                 }
                 else return EXIT_FAILURE; // cannot define keyword3 outside a keyword1
             }
             else if (tokens[0] == "keyword4")
             {
                 if (inside_keyword1 && after_keyword2)
                 {
                     keyword4 = atof(tokens[1].c_str());
                     keyword4_defined  = true;
                 }
                 else return EXIT_FAILURE; // cannot define keyword4 outside a keyword1
             }
         }
     }
 }

 // Calculation


 // output


 return EXIT_SUCCESS;
}

My question is: Is there a more efficient way to go about this besides using booleans in the reading/parsing loop ?

Upvotes: 1

Views: 899

Answers (1)

You ask about something "more efficient", but it seems you don't have a particular performance objective. So what you want here is probably more like a Code Review. There's a site for that, in particular:

https://codereview.stackexchange.com/

But anyway...

You are correct to intuit that four booleans are not really called for here. That's 2^4 = 16 different "states", many of which you should never be able to get to. (Your specification explicitly forbids, for instance, keyword3_defined == true when after_keyword1 == false).

Program state can be held in enums and booleans, sure. That makes it possible for a "forgetful" loop to revisit a line of code under different circumstances, yet still remember what phase of processing it is in. It's useful in many cases, including in sophisticated parsers. But if your task is linear and simple, it's better to implicitly "know" the state based on having reached a certain line of code.

As an educational example to show the contrast I'm talking about, here's a silly state machine to read in a letter A followed by any number of letter Bs:

enum State {
    beforeReadingAnA,
    haveReadAnA,
    readingSomeBs,
    doneReadingSomeBs
};

State s = beforeReadingAnA;
char c;
while(true) {
    switch (s) {
        case beforeReadingAnA: 
            cin >> c;
            if (cin.good() && c == 'A') {
                // good!  accept and state transition to start reading Bs...
                s = haveReadAnA;
            } else {
                // ERROR: expected an A
                return EXIT_CODE_FAILURE;
            };
            break;

         case haveReadAnA:
            // We've read an A, so state transition into reading Bs
            s = readingSomeBs;
            break;

         case readingSomeBs:
            cin >> c;
            if (cin.good() && c == 'B') {
                // good!  stay in the readingSomeBs state
            } else if (cin.eof()) {
                // reached the end of the input after 0 or more Bs
                s = doneReadingSomeBs;
            } else {
                // ERROR: expected a B or the EOF
                return EXIT_CODE_FAILURE;
            }
            break;

         case doneReadingSomeBs:
             // all done! 
             return EXIT_CODE_SUCCESS;
    }
}

As mentioned, it's a style of coding that can be very, very useful. Yet for this case it's ridiculous. Compare with a simple linear piece of code that does the same thing:

// beforeReadingAnA is IMPLICIT

char c;
cin >> c;
if (cin.fail() || c != 'A')
   return EXIT_CODE_FAILURE;

// haveReadAnA is IMPLICIT

do {
    // readingSomeBs is IMPLICIT

    cin >> c;
    if (cin.eof())
       return EXIT_CODE_SUCCESS;
    if (cin.fail() || c != 'B')
       return EXIT_CODE_FAILURE;
}

// doneReadingSomeBs is IMPLICIT

All the state variables disappear. They are unnecessary because the program just "knows where it is". If you rethink your example then you can probably do the same. You won't need four booleans because you can put your cursor on a line of code and say with confidence what those four boolean values would have to be if that line of code happens to be running.

As far as efficiency goes, the <iostream> classes can make life easier than you have it here and be more idiomatically C++ without invoking C-isms like atof or ever having to use c_str(). Let's look at a simplified excerpt of your code that just reads the doubles associated with "keyword1".

string line;
getline(cin, line);
istringstream split(line);
vector<string> tokens;
char split_char = ' ';
string each;
while (getline(split, each, split_char)) {
    tokens.push_back(each);
}
double keyword1_first, keyword1_second;
if (tokens.size() > 2) {
    if (tokens[0] != "keyword1") {
        return EXIT_FAILURE; // input format error
    } else {
        keyword1_first = atof(tokens[1].c_str());
        keyword1_second = atof(tokens[2].c_str());
    }
}

Contrast that with this:

string keyword;
cin >> keyword;
if (keyword != "keyword1") {
    return EXIT_FAILURE;
}
double keyword1_first, keyword1_second;
cin >> keyword1_first >> keyword1_second;

Magic. Iostreams can detect the type you are trying to read or write. If it encounters a problem interpreting the input in the way you ask for, then it will leave the input in the buffer so you can try reading it another way. (In the case of asking for a string, the behavior is to read a series of characters up to whitespace...if you actually wanted an entire line, you would use getline as you had done.)

The error handling is something you'll have to deal with, however. It's possible to tell iostreams to use exception-handling methodology, so that the standard response to encountering a problem (such as a random word in a place where a double was expected) would be to crash your program. But the default is to set a failure flag that you need to test:

cin erratic behaviour

There's nuance to iostream, so you probably want to do some survey of Q&A...I've been learning a bit myself lately while answering/asking here:

Output error when input isn't a number. C++

When to use printf/scanf vs cout/cin?

Upvotes: 2

Related Questions