Ben Atlas
Ben Atlas

Reputation: 127

C++ Why is my newline being misinterpreted?

I am writing a driver program in c++ that will eventually need to pass two strings to a function I am writing in a separate file. I am reading data from a file that is formatted like this:

ac: and
amo: love
amor: love
animal: animal
annus: year
ante: before, in front of, previously
antiquus: ancient
ardeo: burn, be on fire, desire
arma: arms, weapons
atque: and
aurum: gold
aureus: golden, of gold
aurora: dawn

I'm trying to get the latin word into one string and the english equivalent in another string. Also, each time I get an english equivalent I want to be able to send the two strings to my function. My code currently looks like this:

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

//#include "tree.h"

int main(int argc, char* argv[])
{
    string latinWord   = "",
           englishWord = "";

    char   buffer;

    bool   isLatinWord = true;

    ifstream vocabFile;
    vocabFile.open(argv[1]);

    if (!vocabFile)
       cout << "File open failed." << endl;

    while(vocabFile.get(buffer))
    {

        if (isLatinWord)
        {
            if (buffer == ':')
                isLatinWord = false;

            else
                latinWord+= buffer;
        }

        else
        {
            if (buffer == ',') // indicates 1 of multiple equivs processed
            {
                cout << englishWord << " = " << latinWord << endl;
                englishWord = "";
            }

            else if (buffer == '\n') // indicates all english equivs processed
            {
                cout << englishWord << " = " << latinWord << endl;
                isLatinWord = true;
                englishWord = latinWord = ""; // reset both strings
            }

            else
                englishWord+= buffer;
        }
    }
}

The way this SHOULD work is that if there is a colon, that symbolizes that the latin word string is finished populating (the flag is set to false) and then the english word string should start being populated. The english word string should be populated until a comma is hit (send the words to the function at this point), or a newline is hit (resetting the flag because all english equivs have been checked at this point).

However, when I try to output the strings I would send to my functions they are totally messed up.

This is my output:

$ ./prog5 latin.txt
 = ac
 = amo
 = amor
 = animal
 = annus
 before = ante
 in front of = ante
 = anteusly
 = antiquus
 burn = ardeo
 be on fire = ardeo
 = ardeo
 arms = arma
 = armas
 = atque
 = aurum
 golden = aureus
 = aureus
 = aurora

[EDIT] This is my output after the isLatinWord flag was fixed. I'm thinking my code is recognizing newlines in a wrong way, and I was wondering if anyone sees any errors or has any suggestions?

Thanks, Ben

Upvotes: 1

Views: 87

Answers (3)

Aaron McDaid
Aaron McDaid

Reputation: 27133

Using getline to read (parts of) lines up to a desired delimiter:

#include<iostream>
#include<fstream>
#include<sstream>
using namespace std;

int main() {
    string word;
    ifstream data("data.txt");

    string latin_word;
    while(getline(data,latin_word,':')) { // Read up to, but not including, the colon. But it does *discard* the colon
        cout << "Read latin word: <" << latin_word << '>' << endl;
        // Read the rest of the line
        string rest_of_line;
        getline(data, rest_of_line);
        // Now, we want to split it on commas. Easiest way is to build a stream object wrapped around this string
        istringstream rest_of_line_stream(rest_of_line);
        string english_phrase;
        while(
                  rest_of_line_stream >> std:: ws, 
                  getline(rest_of_line_stream, english_phrase,',')
             ) {
            cout << '@' << latin_word << "@\t@" << english_phrase << '@' << endl;
        }
    }
}

Update: I had forgotten to discard enough whitespace. getline retains any leading whitespace by default. This can be a problem after the : and , in this data. Therefore, before any attempt to read an English phrase, I use rest_of_line_stream >> std:: ws to read and discard any whitespace.

The inner while loop might seem a bit strange. I have two things inside the while brackets: rest_of_line_stream >> std:: ws and then getline(rest_of_line_stream, english_phrase,','). The are separated by a comma and this is the comma operator in C and C++. Basically, it just means the first thing is evaluated but its result ignored. The boolean used for the while loop is then just the result of getline(rest_of_line_stream, english_phrase,',')

Upvotes: 0

chiastic-security
chiastic-security

Reputation: 20520

This line

latinWord = true;

should be

isLatinWord = true;

Upvotes: 0

Jakub Strzadala
Jakub Strzadala

Reputation: 163

New lines can be also represented as \r\n character, I would check for this as well.

Upvotes: 1

Related Questions