qed
qed

Reputation: 23134

Proper way to read binary file in C++?

I have been search on the internet for a way to read binary files in c++, and I have found two snippets that kind of works:

No.1:

#include <iostream>
#include <fstream>

int main(int argc, const char *argv[])
{
   if (argc < 2) {
      ::std::cerr << "Usage: " << argv[0] << "<filename>\n";
      return 1;
   }
   ::std::ifstream in(argv[1], ::std::ios::binary);
   while (in) {
      char c;
      in.get(c);
      if (in) {
         // ::std::cout << "Read a " << int(c) << "\n";
         printf("%X ", c);
      }
   }
   return 0;
}

Result:

6C 1B 1 FFFFFFDC F FFFFFFE7 F 6B 1 

No.2:

#include <stdio.h>
#include <iostream>

using namespace std;

// An unsigned char can store 1 Bytes (8bits) of data (0-255)
typedef unsigned char BYTE;

// Get the size of a file
long getFileSize(FILE *file)
{
    long lCurPos, lEndPos;
    lCurPos = ftell(file);
    fseek(file, 0, 2);
    lEndPos = ftell(file);
    fseek(file, lCurPos, 0);
    return lEndPos;
}

int main()
{
    const char *filePath = "/tmp/test.bed";
    BYTE *fileBuf;          // Pointer to our buffered data
    FILE *file = NULL;      // File pointer

    // Open the file in binary mode using the "rb" format string
    // This also checks if the file exists and/or can be opened for reading correctly
    if ((file = fopen(filePath, "rb")) == NULL)
        cout << "Could not open specified file" << endl;
    else
        cout << "File opened successfully" << endl;

    // Get the size of the file in bytes
    long fileSize = getFileSize(file);

    // Allocate space in the buffer for the whole file
    fileBuf = new BYTE[fileSize];

    // Read the file in to the buffer
    fread(fileBuf, fileSize, 1, file);

    // Now that we have the entire file buffered, we can take a look at some binary infomation
    // Lets take a look in hexadecimal
    for (int i = 0; i < 100; i++)
        printf("%X ", fileBuf[i]);

    cin.get();
    delete[]fileBuf;
        fclose(file);   // Almost forgot this
    return 0;
}

Result:

6C 1B 1 DC F E7 F 6B 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A1 D 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

The result of xxd /tmp/test.bed:

0000000: 6c1b 01dc 0fe7 0f6b 01                   l......k.

The result of ls -l /tmp/test.bed

-rw-rw-r-- 1 user user 9 Nov  3 16:37 test.bed

The second method is giving the right hex codes in the beginning but seems got the file size wrong, the first method is messing up the bytes.

These methods look very different, perhaps there are many ways to do the same thing in c++? Is there an idiom that pros adopt?

Upvotes: 1

Views: 2605

Answers (4)

qed
qed

Reputation: 23134

In a search for why @Roland Illig 's answer (now deleted) does not work, I found the following solution, not sure if it's up to the professional standard, but it gives right results so far, and allows to check the beginning n-bytes of a file:

#include <iostream>
#include <fstream>
#include <cstdlib>
#include <string>


int main(int argc, const char *argv[])
{
    if (argc < 3) {
        ::std::cerr << "usage: " << argv[0] << " <filename>\n";
        return 1;
    }

    int nbytes = std::stoi(argv[2]);
    char buffer[nbytes];
    std::streamsize size = nbytes;

    std::ifstream readingFile(argv[1], std::ios::binary);
    readingFile.read(buffer, (int)size);
    std::streamsize bytesread = readingFile.gcount();
    unsigned char rawchar;
    if (bytesread > 0) {
        for (int i = 0; i < bytesread; i++) {
            rawchar = (unsigned char) buffer[i];
            printf("%02x ", (int) rawchar);
        }
        printf("\n");
    }

    return 0;
}

Another answer I got from wibit.com :

#include <iostream>
#include <fstream>
using namespace std;

int main(int argc, const char* argv[])
{
  ifstream inBinaryFile;
  inBinaryFile.open(argv[1], ios_base::binary);
  int currentByte = inBinaryFile.get();
  while(currentByte >= 0)
  {
    printf("%02x ", currentByte);
    currentByte = inBinaryFile.get();
  }
  printf("\n");
  inBinaryFile.close();
  return 0;
}

Upvotes: 0

Dietmar K&#252;hl
Dietmar K&#252;hl

Reputation: 154025

You certainly want to convert the char objects to unsigned char before processing them as integer values! The problem is that char may be signed in which case negative values get converted to negative ints when you cast them. Negative ints displayed as hex will have more then two hex digits, the leading ones probably all "f".

I didn't immediately spot why the second approach gets the size wrong. However, the C++ approach to read a binary file is simple:

#include <iostream>
#include <fstream>
#include <vector>
#include <iomanip>

std::vector<unsigned char> bytes;
{
    std::ifstream in(name, std::ios_base::binary);
    bytes.assign(std::istreambuf_iterator<char>(in >> std::noskipws),
                 std::istreambuf_iterator<char>());
}
std::cout << std::hex << std::setfill('0');
for (int v: bytes) {
    std::cout << std::setw(2) << v << ' ';
}

Upvotes: 1

Matteo Italia
Matteo Italia

Reputation: 126927

Both your methods are some strange mix of C and C++ (well, actually the second is just plain C); still, the first method is mostly right, but you have to use an unsigned char for c, otherwise any byte over 0x7f is read as negative, which results in that wrong output.1

To do things correctly and in the "C++ way", you should have done:

std::cout<<std::hex<<std::setfill('0');

...

   if (in)
      std::cout << std::setw(2)<<int(c) << "\n";

The second one gets the "signedness" correct, but it's mostly just C. A quick fix would be to fix the 100 in the for loop, replacing it with fileSize. But in general, loading the whole file in memory just to dump its content in hexadecimal is a botched idea; what you normally do is to read the file a piece at time in a fixed-size buffer and convert it by the by.


  1. get returns an int; if it's bigger than 0x7f it overflows the char when assigning, and typically results in some negative value. Then when it is passed to printf it gets sign-extended (since any signed integer parameter passed to a vararg function is widened to int) but interpreted as an unsigned int due to the %X parameter. (all this assuming 2's complement arithmetic, non-signaling integer overflow and signed char)

Upvotes: 1

dnk
dnk

Reputation: 661

In the first case you're printing char (which is signed) while in the second case you're doing the same with unsigned char. %X extends chars to ints and that causes the difference.

Upvotes: 0

Related Questions