ladookie
ladookie

Reputation: 1371

CSV Parser works in windows, not linux

I'm parsing a CSV file that looks like this:

E1,E2,E7,E8,,,
E2,E1,E3,,,,
E3,E2,E8,,,
E4,E5,E8,E11,,,

I store the first entry in each line in a string, and the rest go in a vector of strings:

while (getline(file_input, line)) {
    stringstream tokenizer; 
    tokenizer << line;
    getline(tokenizer, roomID, ',');
    vector<string> aVector;
    while (getline(tokenizer, adjRoomID, ',')) {
        if (!adjRoomID.empty()) {
            aVector.push_back(adjRoomID);
        }
    }
    Room aRoom(roomID, aVector);
    rooms.addToTail(aRoom);
}

In windows this works fine, however in Linux the first entry of each vector mysteriously loses the first character. For Example in the first iteration through the while loop:

roomID would be E1 and aVector would be 2 E7 E8

then the second iteration: roomID would be E2 and aVector would be 1 E3

Notice the missing E's in the first entry of aVector.

when I put in some debugging code it appears that it is initially being stored correctly in the vector, but then something overwrites it. Kudos to whoever figures this one out. Seems bizarre to me.

EDIT: thank you Erik. I finally understand. On windows all the lines just end with a \n. When I switch to Unix\Linux however, the lines end in \r\n. Thus, when getline reads a line it reads everything into the string including the \r. I was not accounting for this \r and it was screwing me up. The problem wasn't that the E was missing. It was that I had an extra entry in the vector with a single \r character in it. My other classes couldn't handle this entry with a single \r in it.

Upvotes: 0

Views: 543

Answers (3)

Walter Mundt
Walter Mundt

Reputation: 25271

Oops: misread your question, thought it was talking about not working on Windows. I'm leaving the answer here in case anyone stumbles upon this in need of it, but I don't think it will help you (the asker) in this case.

If you're on MSVC6, you could be encountering this bug with the getline function. There's a fix in the link.

For posterity, here's the info from the link:

SYMPTOM: "The Standard C++ Library template getline function reads an extra character after encountering the delimiter. Please refer to the sample program in the More Information section for details."

Modify the getline member function, which can be found in the following system header file string, as follows:

else if (_Tr::eq((_E)_C, _D))
            {_Chg = true;
          //  _I.rdbuf()->snextc(); /* Remove this line and add the line below.*/ 
              _I.rdbuf()->sbumpc();
            break; }

Note: Because the resolution involves modifying a system header file, extreme care should be taken to ensure that nothing else is changed in the header file. Microsoft is not responsible for any problems resulting from unwanted changes to the system header file

Upvotes: 3

Erik
Erik

Reputation: 91300

I suspect that the \r in the windows \r\n linefeed could mess up the code doing your printing.

If you change to this if statement, does the problem disappear?

if (!adjRoomID.empty() && (adjRoomID[0] != '\r'))

EDIT: Fixed typo

Upvotes: 2

Jack Saalw&#228;chter
Jack Saalw&#228;chter

Reputation: 211

Try some cout debugging. Print out the values as you read them in:

if (!adjRoomID.empty()) {
    cout << '"' << adjRoomId << '"' << endl;
    aVector.push_back(adjRoomID);
}

That will tell you if your strings are being read correctly from the get-go, and will also probably tell you if you're reading in extra weird characters from the file.

Upvotes: 0

Related Questions