user51462
user51462

Reputation: 2072

Do `seekg()` and `seekp()` operate on characters or bytes?

Page 393 of 'Programming: Principles and Practice' introduces seekg() and seekp() as follows:

However, if you must, you can use positioning to select a specific place in a file for reading or writing. Basically, every file that is open for reading has a "read/get position" and every file that is open for writing has a "write/put position":

[diagram]


fstream fs {name};          // open for input and output
if (!fs) error("can't open ", name);

fs.seek(5);                 // move reading position to the 5 (the 6th character)
char ch;
fs >> ch;                   // read and increment reading position
cout << "character[5] is " << ch << ' {' << int(ch) << "}\n";

fs.seekp(1);                // move writing position to 1
fs << 'y';                  // write and increment writing position 

In the code snippet, "position" is expressed in terms of characters, e.g. position 5 is referred as the "6th character". This confused me because up until this point, we've thought of a file as a sequence of bytes, so I would have expected position to be expressed in terms of bytes (in the example above, I thought 5 was the position of the 6th byte of the file).

So, I tried to test it out by writing to position 1 of a file containing a single wide character:

wide.txt

test.cpp

#include "../std_lib_facilities.h"

int main() {

    fstream fs {"wide.txt"};

    fs.seekp(1);
    fs << 'y';
    fs.close();

    return 0;

}

After running this code, wide.txt looks like this:

�y�

It seems that the charater 'y' was written to the 2nd byte of the program, not to the 2nd character, which would imply that position refers to a byte, not a character. So, why does the code snippet in the book say "character"?

I also noticed that the function signature is basic_ostream& seekp( pos_type pos ); (see CPP Reference), but I can't find an explanation of whether pos_type refers to a character or a byte.

The reference on cplusplus.com also seems to define position in terms of characters (emphasis added):

Sets the position where the next character is to be inserted into the output stream.

As does the following comment on a Reddit thread (emphasis added):

A streampos IS NOT an integer, it's not some byte position in a stream. It represents a character position in the stream, and the type holds some stream state information for the purpose of code conversion and character position.

But this seems to contradict what I see in my example, where fs.seekp(1) seems to overwrite byte 1 (the 2nd byte).

Upvotes: 2

Views: 612

Answers (3)

vitaut
vitaut

Reputation: 55685

seekg() and seekp operate on code units which are often referred to as "characters" in C++ although the latter term is pretty overloaded and may mean other things. They definitely don't operate on bytes which is easy to see if you consider that for wide streams the buffer elements have type wchar_t.

Quoting https://en.cppreference.com/w/cpp/io/basic_streambuf:

The controlled character sequence (buffer) is an array of CharT which, at all times, represents a subsequence, or a "window" into the associated character sequence.

Upvotes: 3

KamilCuk
KamilCuk

Reputation: 141748

Do seekg() and seekp() operate on characters or bytes?

Both?

why does the code snippet in the book say "character"?

char is a character. It represents one byte. In this context, they are the same.

There is also wfstream. It operates on wide characters. These characters take multiple bytes encoded in one wchar_t type.

whether pos_type refers to a character or a byte.

This all is an abstraction. It depends on what the stream refers to. In case of fstream a character is one char which is one byte. In case of wfstream a character has sizeof(wchar_t) bytes. In case of my imaginary typedef basic_ios<__uint128_t> my_super_stream_with_16_bytes_characters; a character has 16 bytes.

contradict what I see in my example, where fs.seekp(1) seems to overwrite byte 1

No, just fs in this case refers to a stream where one character represents one byte (on your operating system on your implementation).

Upvotes: 0

Ted Lyngmo
Ted Lyngmo

Reputation: 117841

Do seekg() and seekp() operate on characters or bytes?

In the context of the seek* functions, they are the same thing - and it has only a very loose relation to visible (or invisible) characters in graphemes, grapheme-like units or symbols.

For encodings, like UTF8, stepping "a character" will need clarification. A seek* character does not take code points into consideration, so stepping one seek*-character may position the r/w pointer somewhere inside a 4 octet long unicode character and reading or writing from/to there may result in an invalid codepoint - or some unexpected grapheme.

Upvotes: 0

Related Questions