Theodore
Theodore

Reputation: 309

Binary input of text file

Programming Principles and Practice says in the Chapter 11: "In memory, we can represent the number 123 as an integer value (each int on 4 bytes) or as a string value (each character on 1 byte)".

I'm trying to understand what is stored in the memory, when reading binary a text file. So I'm writing the content of the vector v.

If the input file contains this text: "test these words"

The output file shows these numbers: 1953719668 1701344288 1998611827 1935962735 168626701 168626701 168626701 168626701 168626701 168626701

I tried to convert each char of "test" to binary and I have 01110100 01100101 01100101 01110100 and if I consider this as an integer of 4 bytes and convert it to decimal I get 1952802164, which is still different from the output.

How is this done correctly, so I can understand what's going on? Thanks!

#include<iostream>
#include<string>
#include<vector>
#include<algorithm>
#include<cmath>
#include<sstream>
#include <fstream>
#include <iomanip>
using namespace std;

template <class T>
char *as_bytes(T &i) // treat a T as a sequence of bytes
{
    void *addr = &i; // get the address of the first byte of memory used to store the object
    return static_cast<char *>(addr); // treat that memory as bytes
}

int main()
{
    string iname{"11.9_in.txt"};
    ifstream ifs {iname,ios_base::binary}; // note: stream mode
    string oname{"11.9_out.txt"};
    ofstream ofs {oname,ios_base::binary}; // note: stream mode

    vector<int> v;
    // read from binary file:
    for(int x; ifs.read(as_bytes(x),sizeof(int)); ) // note: reading bytes
        v.push_back(x);

    for(int x : v)
        ofs << x << ' ';

}

Upvotes: 2

Views: 694

Answers (1)

MikeCAT
MikeCAT

Reputation: 75062

Let me assume you are using little-endian machine (for example, x86) and ASCII-compatible character code (such as Shift_JIS and UTF-8).

test is represented as 74 65 73 74 as binary data.

Using little-endian, higher bytes of muitl-byte integer is placed to higher address.

Therefore, reading thes as 4-byte integer, it will be interpreted as 0x74736574 and it is 1953719668 in decimal.

Upvotes: 6

Related Questions