MBZ
MBZ

Reputation: 27632

C++ UTF-8 actual string length

Is there any native (cross platform) C++ function in any of standard libraries which returns the actual length of std::string?

Update: as we know std::string.length() returns the number of bytes not the number of characters. I already have a custom function which returns the actual one, but I'm looking for an standard one.

Upvotes: 2

Views: 5466

Answers (3)

Mihai Nita
Mihai Nita

Reputation: 5787

There is no way to do that in C/C++, without 3rd party libraries. Even if you convert to char32_t, you will get code points, not characters.

A code point does not match the user perception of a character, because of things like decompose formats, ligatures, variation selectors.

The closest available construct to a "user character" is a "grapheme cluster" (see http://www.unicode.org/reports/tr29/)

Your best cross-platform option is ICU4C (http://site.icu-project.org/)

Upvotes: 1

Pavel Radzivilovsky
Pavel Radzivilovsky

Reputation: 19104

Actual length is the number of bytes. There is very little meaning to counting codepoints. You may though want to count other things like grapheme clusters.

See more about different kind of string lengths in http://utf8everywhere.org

Upvotes: 1

Ben Voigt
Ben Voigt

Reputation: 283883

codecvt ought to be helpful, the Standard provides implementations for UTF-8, for example codecvt_utf8<char32_t>() would be appropriate in this case.

Probably something like:

wstring_convert< codecvt_utf8<char32_t>, char32_t >().from_bytes(the_std_string).size()

Upvotes: 6

Related Questions