Reputation: 697
I believe the output has to do with UTF, but I do not know how. Would someone, please, explain?
#include <iostream>
#include <cstdint>
#include <iomanip>
#include <string>
int main()
{
std::cout << "sizeof(char) = " << sizeof(char) << std::endl;
std::cout << "sizeof(std::string::value_type) = " << sizeof(std::string::value_type) << std::endl;
std::string _s1 ("abcde");
std::cout << "s1 = " << _s1 << ", _s1.size() = " << _s1.size() << std::endl;
std::string _s2 ("abcdé");
std::cout << "s2 = " << _s2 << ", _s2.size() = " << _s2.size() << std::endl;
return 0;
}
The output is:
sizeof(char) = 1
sizeof(std::string::value_type) = 1
s1 = abcde, _s1.size() = 5
s2 = abcdé, _s2.size() = 6
g++ --version
prints g++ (Ubuntu 5.4.0-6ubuntu1~16.04.1) 5.4.0 20160609
QTCreator
compiles like this:
g++ -c -m32 -pipe -g -std=c++0x -Wall -W -fPIC -I../strsize -I. -I../../Qt/5.5/gcc/mkspecs/linux-g++-32 -o main.o ../strsize/main.cpp
g++ -m32 -Wl,-rpath,/home/rodrigo/Qt/5.5/gcc -o strsize main.o
Thanks a lot!
Upvotes: 1
Views: 116
Reputation: 8238
Even in C++11 std::string
has nothing to do with UTF-8. In the description of size
and length
methods of std::string
we can see:
For std::string, the elements are bytes (objects of type char), which are not the same as characters if a multibyte encoding such as UTF-8 is used.
Thus, you should use some third-party unicode-compatible library to handle unicode strings.
If you continue to use non-unicode string classes with unicode strings, you may face LOTS of other problems. For example, you'll get a bogus result when trying to compare same-looking combining character and precomposed character.
Upvotes: 3
Reputation: 294207
gcc
default input character set is UTF-8. Your editor also probably saved the file as UTF-8, so in your input .cpp file the string abcdé
will have 6 bytes (As Peter already answered, the LATIN SMALL LETTER E WITH ACUTE is encoded in UTF-8 with 2 bytes). std::string::length
returns the length in bytes, ie. 6. QED
You should open your source .cpp file in a hex editor to confirm.
Upvotes: 4