Kelv
Kelv

Reputation: 81

C++ How can I retrieve a file BOM to get its type of encoding?

I don't know if its possible, but is there a way to retrieve the first 4 bytes of a file (most likely the BOM) in order to get its type of encoding (UTF-8, UTF-16LE, CP1252, etc...). And then, if the file selected was encoded in UTF-8, the values found in an array "tabBytes[]" would be something like:

    tabBytes[0] = 0xEF
    tabBytes[1] = 0xBB
    tabBytes[2] = 0xBF
    tabBytes[3] = XXXX

Thanks for taking time and helping me! I'll be looking forward to read your comments and answers on this.

EDIT: I'm new to C++, so the code I wrote before is probably wrong, thus I removed it.

FINAL EDIT: Finally I found a solution to my problem, thanks to those who helped me!

Upvotes: 1

Views: 165

Answers (1)

Jonathan Potter
Jonathan Potter

Reputation: 37142

Array indices start at 0, so you're writing past the end of the buffer with buffer[fourBytes] = '\0';. You need to allocate fourBytes + 1 bytes if you want to do that. This should stop the crash you're getting when you delete the buffer.

However the only reason for null-terminating the buffer like that is if you want to treat it as a C-style string (e.g. to print it out), which you don't seem to be doing. You're copying it into tabBytes, but you're not copying the null-terminator. So it's unclear exactly what it is you're trying to achieve.

Your overall logic for reading the first few bytes from the file is fine. Although based on the code above, you could just read the data straight into tabBytes and do away with the allocation/copy/free of buffer.

Upvotes: 4

Related Questions