Reputation: 7048
I used the following on ASCII file:
#include <fstream>
#include <streambuf>
#include <string>
#include <cerrno>
std::string get_file_contents(const char *filename)
{
std::ifstream in(filename, std::ios::in | std::ios::binary);
if (in)
{
return(std::string((std::istreambuf_iterator<char>(in)), std::istreambuf_iterator<char>()));
}
throw(errno);
}
I want to confirm if it will work for a UTF-8 file as well into std::string or are there any special settings?
Upvotes: 0
Views: 1009
Reputation: 1978
It's fine to read all UTF-8 characters like this; it's just a sequence of bytes after all and only when you further process, convert or output text then you'll need to ensure that the encoding is taken into account.
One potential pitfall is the BOM (https://en.wikipedia.org/wiki/Byte_order_mark). If your text file has a BOM then you may want to manually remove it from the string or handle it appropriately. There shouldn't be any need to use the BOM with UTF-8 but some software does it anyway to distinguish types of encoding, presumably. Notepad on Windows saves a BOM, for example (have Notepad save the file with UTF-8 encoding and open the file in the binary editor to check it out).
Upvotes: 2