NoobTove
NoobTove

Reputation: 31

Why does the newline character in an ifstream file - when read by this code - occupy 2 bytes?

I used a file which had 15 lines with 2 characters each and hence assumed the size of the file to be around 44 bytes but, using the tellg function, the size is shown as 58. Furthermore, I accumulated an array of all the positions the code was identifying a newline character and they were all consecutive and hence confirmed this doubt. Thank you!

//Tailfile - This program accepts a file and prints the last 10 lines.
//This function determines the number of lines and how to display it
int lineidentifier(fstream&tailfile,long& position)
{
    tailfile.seekg(0,ios::end);//sets the read position at the end of file.
    long n=0;//counter for the number of lines
    long i=tailfile.tellg();//counter for the number of characters set to 
                        //thenumber of bytes in the file and hence, the end.
    char ch;//To hold and check the character.
    while(n<10&&i>=0)//conditions are as long as the number of characters 
                 //are not exhausted or the number of lines
    {
        tailfile.seekg(i, ios::beg);//sets the read position to the end of 
                   //the file by using the number of characters and the file
                                //mode as the beginning.
        cout<<"1. "<<i<<endl;//DEBUGGING EXTRA
        tailfile.get(ch);//Reads the content at i
        tailfile.clear();//clears the eof flag set by the first iteration 
                          //because we reach the end of the file.
        cout<<"2. "<<i<<endl;//DEBUGGING EXTRA
        if(ch=='\n')//if the character received is the newline character 
                 //leading to us regarding it as a line has been identified.
        {
            n++;//Increment n accordingly.
            position=i;//The position is the byte i is at before the 
                 //character was read, hence the position of the character.
            cout<<position<<endl;//DEBUGGING EXTRA
            cout<<ch<<endl;//DEBUGGING EXTRA
            i--;
        }
        i--;
        cout<<"4. "<<i<<endl;//DEBUGGING EXTRA
    }
    cout<<i<<endl;//DEBUGGING EXTRA
    if(i<=1)//Using the position of i to indicate whether the file has more 
         //than 10 lines. If i is less than 1, it has reached the
    //beginning of the file
        return 0;
    else
        return 1;
}

Upvotes: 3

Views: 1229

Answers (3)

Pete Becker
Pete Becker

Reputation: 76235

The answers I've seen so far are essentially correct, but they muddle two different notions. '\n' and '\r' are escape sequences; each one represents a single character whose value is implementation-dependent. Typically those are 0x0A and 0x0D because that's often convenient, but they are not required to have those values.

When you write the character '\n' to an output stream, the runtime library does whatever is needed to produce a new line. For Unix, the convention is that the byte 0x0A means "start a new line". For Windows, the convention is that the byte 0x0A means "move down to the next line" (i.e., line feed) and the byte 0x0D means "move to the start of the current line"; the combination starts a new line.

In the ASCII encoding, the values 0x0A and 0x0D represent a line feed and a carriage return, respectively. They have no inherent connection to the C/C++ escape sequences '\n' and '\r'.

Upvotes: 0

Richard
Richard

Reputation: 61239

Linux uses \n (Line Feed, 0x0A) as its new line character.

Windows/DOS uses \r\n (Carriage Return (0x0D) and Line Feed (0x0A)) as its new line character.

Likely you are reading a DOS-encoded file.

This answer provides further details.

Upvotes: 3

jpo38
jpo38

Reputation: 21514

Open your file with a binary file editor, like Hexedit, you'll most likely see that new lines are coded with \n\r (0x0A, "line feed" and 0x0D, "carriage return"), not just \n.

By the way, just read the file using getline:

std::ifstream infile("thefile.txt");
std::string line;
while (std::getline(infile, line))
{

}

then , you don't care have to worry about how EOL was coded...

Upvotes: 1

Related Questions