Clearer
Clearer

Reputation: 2306

Reading utf-8 files to std::string in C++

Finally! We're starting to require that all our input files are encoded in utf-8! This is something we've been wanting to do for years. Unfortunately, we suck at it since none of us have ever tried it and most of us are Windows programmers or are used to operating systems where utf-8 is the only real option anyway; neither group knows anything about reading utf-8 strings in a platform agnostic way.

So we started to look at how to deal with utf-8 in a platform agnostic way and found that its pretty confusing (because Windows) and the other questions I've found here on stackoverflow don't really seem to cover our scenario or they are confusing. I found a reference to https://www.codeproject.com/Articles/38242/Reading-UTF-with-C-streams which, I find, is a bit confusing and contains a great deal of fluff.

So a few assumptions (that must be true or we're in a state of GIGO)

We're trying to avoid using std::wstring if we can and I see no reason to use it anyway. We're also trying to avoid using any third party libraries which do not use utf-8 encoded std::string; using a custom string with functions that overloads and converts all std::string arguments to the a custom string is acceptable.

Is there any way to do this using just the standard C++ library? Preferably just by imbuing the global locale with a facet that tells the stream library to just dump content of files in strings (using custom delimiters as usual); no conversion allowed.

This question is only about reading utf-8 files into std::strings and storing the content as utf-8 encoded strings. Dealing with Windows APIs and such is a separate concern.

C++17 is available.

Upvotes: 2

Views: 5100

Answers (1)

Nicol Bolas
Nicol Bolas

Reputation: 473407

UTF-8 is just a sequence of bytes that follow a specific encoding. If you read a sequence of bytes that is legitimate UTF-8 data into a std::string, then the string contains UTF-8 data.

There's nothing special you have to actually do to make this happen. This works like any other C or C++ file loading. Just don't mess around with iostream locales and you'll be fine.

Upvotes: 4

Related Questions