Alcott
Alcott

Reputation: 18575

file operation in binary vs text mode -- performance concern

In many projects, I saw that data object/structure are written into file in binary mode, and then retrieve them back from the file in binary mode again.

I wonder why they do it in binary mode? Any performance difference between text and binary mode? If not, then when to use binary mode or text mode?

Upvotes: 15

Views: 10221

Answers (7)

astha
astha

Reputation: 1

Binary format is more accurate for storing the numbers as they are stored in tha exact internal representation. There are no conversations while saving the data and therefore saving is much faster.

Upvotes: 0

James Kanze
James Kanze

Reputation: 153899

Historically, binary mode is to provide more or less transparent access to the underlying stream; text mode "normalizes" to a standard text representation, where lines are terminated by the single '\n' character. In addition, the system may impose restrictions on the size of a binary file, for example by requiring it to be a multiple of 128 or 512 bytes. (The first was the case of CP/M, the second of many of the DEC OS's.) Text files don't have this restriction, and in cases where the OS imposed it, the library will typically introduce an additional end of file character for text files. (Even today, most Windows libraries recognize the old CP/M end of file, 0x1A, when reading in text mode.) Because of these considertaions, text mode is only defined over a limited set of binary values. (But if you write 200 bytes to a binary file, you may get back 256 or 512 when you re-read it. Historically, binary should only be used for text that is otherwise structured, so that you can recognize the logical end, and ignore these additional bytes.)

Also, you can seek pretty much arbitrarily in a file opened in binary mode; you can only seek to the beginning, or to a position you've previously memorized, in text mode. (This is because the line ending mappings mean that there is no simple relationship between the position in the file, and the position in the text stream.)

Note that this is orthogonal to whether the output is formatted or not: if you output using << (and input using >>), the IO is formatted, regardless of the mode in which the file was opened. And the formatting is always text; the iostreams are designed to manipulate streams of text, and only have limited support for non-text input and output.

Today, the situation has changed somewhat: in many cases, we expect what we write to be readable from other machines, which supposes a well defined format, which may not be the format used natively. (Thus, for example, the Internet expects the two byte sequence 0x0D, 0x0A as a line ending, which is different than what is used internally in Unix and many other OS's.) If portability is a concern, you generally define a format, write it explicitly, and use binary mode to ensure that what you write is exactly what is written; similarly on input, you use binary format, and handle the conventions manually. If you're just writing to a local disk, which isn't shared, however, text mode is fine, and a bit less work.

Again, both of these apply to text. If you want a binary format, you must use binary mode, but that's far from sufficient. You'll have to implement all of the formatted IO yourself. In such cases, I generally don't use std::istream or std::ostream (whose abstraction is text), but rather define my own stream types, deriving from std::ios_base (for the error handling conventions), and using std::streambuf (for the physical IO).

Finally, don't neglect the fact that all IO is formatted in some manner. Just writing a block of memory out to the file means that the format is whatever the current implementation happens to give you (which is generally undocumented, which means that you probably won't be able to read it in the future). If all you're doing is spilling to disk, and the only time you'll read it is with the same program, compiled with the same version of the same compiler, using the same compiler options, then you can just dump memory, provided the memory in question is only PODs, and contains no pointers. Otherwise, you have to define (and document) the format you use, and implement it. In such cases, I'd suggest using an existing format, like XDR, rather than inventing your own: it's a lot easier to write "uses XDR format" as documentation, rather than describing the actual bit and byte layout for all of the different types.

Upvotes: 8

perilbrain
perilbrain

Reputation: 8187

In binary mode you have got a size of byte(consider 256 ) to be utilized and in text mode its hardly more than 100 characters. Obviously you are going to gain more than double size for storing data.
Further there are cases where you have to abide by structure specification such as a network packet like IPv4.

Let us take an example

//No padding
typedef struct abc
{
 int a:4
 char b;
 double c;
} A[]={{.a=4,.b='a',.c=7.45},{.a=24,.b='z',.c=3.2}} ;

Isn't it difficult to store bit fields in text mode.obviously you gonna loose so many things.

However you can save data object in text format as done using MIME,but it will require an extra routine to to convert in binary mode; Performance hammered.

Upvotes: 2

Glenn
Glenn

Reputation: 1167

Binary is faster. Consider an integer stored in 32 bits (4 bytes), such as 123456. If you were to write this out as binary (which is how it is represented in the computer) it would take 4 bytes (ignoring padding between items for alignment in structures).

To write the number as text, it has to be converted to a string of characters (some overhead to convert and memory to store) and then written it out, it will take at least 6 bytes as there are 6 characters to respresent the number. This is not including any additional padding such as spaces for alignment or delimiters to read/seperate the data.

Now if you consider it you had several thousands of items, the additional time can add up and require more space, which would take longer to read in and then there is the additonal time to convert back to binary for storage after you have read the value into memory.

The advantage to text, is that it is much easier to read for persons, rather then trying to read binary data or hex dumps of the data.

Upvotes: 21

wallyk
wallyk

Reputation: 57764

Only a few operating systems are affected by the choice between binary and text mode. None of the Unix or Linux systems do anything special for text mode—that is, text is the same as binary.

Windows and VMS in particular transform data in text mode. Windows transforms \n into \r\n when writing to a file and the converse when reading. VMS has a file record structure to observe, so in the default mode, it translates \n into a record delimiter.

Where it is different, binary is faster. If it is not different, it makes no difference.

Upvotes: 2

Some programmer dude
Some programmer dude

Reputation: 409136

If your program is the only program that is going to use the file, you can save internal structures "as is" using binary files.

However, if you want to exchange the files with other programs, or over the Internet, then binary formats are not that good. Think for example about the problem with big-endian vs. little-endian machines. Also, the receiver of the files or data will most likely not have access to your code and your structures, so a text-based format might be easier to parse and implement into own structures.

About performance, it's true that reading and writing your internal structures directly will be quicker, because you don't have to translate them (also known as marshaling) into another format.

Upvotes: 7

SingerOfTheFall
SingerOfTheFall

Reputation: 29966

If you read/write a file in a text mode, you are operating text. It might be a subject of encoding errors, and OS-specific format changes, though sometimes it may work just fine. In binary mode, though, you will not meet these restrictions. Also, text mode may do funny things with \n characters, such as replacing them with \n\r.

Fopen reference, for example, says:

In the case of text files, depending on the environment where the application runs, some special character conversion may occur in input/output operations to adapt them to a system-specific text file format. In many environments, such as most UNIX-based systems, it makes no difference to open a file as a text file or a binary file; Both are treated exactly the same way, but differentiation is recommended for a better portability.

Upvotes: 3

Related Questions