Reputation: 70030
I have referred below posts before asking here:
std::string, wstring, u16/32string clarification
std::u16string, std::u32string, std::string, length(), size(), codepoints and characters
But they don't my question. Look at the simple code below:
#include<iostream>
#include<string>
using namespace std;
int main ()
{
char16_t x[] = { 'a', 'b', 'c', 0 };
u16string arr = x;
cout << "arr.length = " << arr.length() << endl;
for(auto i : arr)
cout << i << "\n";
}
The output is:
arr.length = 3 // a + b + c
97
98
99
Given that, std::u16string
consists of char16_t
and not char
shouldn't the output be:
arr.length = 2 // ab + c(\0)
<combining 'a' and 'b'>
99
Please excuse me for the novice question. My requirement is to get clear about the concept of new C++11 strings.
Edit:
From @Jonathan's answer, I have got the loophole in my question. My point is that how to initialize the char16_t
, so that the length of the arr
becomes 2
(i.e. ab
, c\0
).
FYI, below gives a different result:
char x[] = { 'a', 'b', 'c', 0 };
u16string arr = (char16_t*)x; // probably undefined behavior
Output:
arr.length = 3
25185
99
32767
Upvotes: 4
Views: 11133
Reputation: 48635
When you do:
char16_t x[] = { 'a', 'b', 'c', 0 };
It is similar to doing this (endianness not withstanding):
char x[] = { '\0', 'a', '\0', 'b', '\0', 'c', '\0', '\0' };
Each character occupies two bytes in memory.
So when you ask for the length of a u16string
each two bytes is counted as one character. They are, after all, two-byte (16bit) characters.
EDIT:
Your additional question is creating a string without a null terminator.
Try this:
char x[] = { 'a', 'b', 'c', 0 , 0, 0};
u16string arr = (char16_t*)x;
Now the first character is {'a', 'b'}
the second character is {'c', 0}
and you also have a null terminator character {0, 0}
.
Upvotes: 3
Reputation: 234584
C++ supports the following way to build 16-bit integers from 8-bit integers:
char16_t ab = (static_cast<unsigned char>('a') << 8) | 'b';
// (Note: cast to unsigned meant to prevent overflows)
Upvotes: -1
Reputation: 1226
shouldn't the output be:
arr.length = 2
// ab + c(\0) 99
No.
The elements of x
are char16_t
, regardless of that you provide char literals for initialization:
#include<iostream>
int main () {
char16_t x[] = { 'a', 'b', 'c', 0 };
std::cout << sizeof(x[0]) << std::endl;
}
output:
2
Addendum, referring to the EDIT of the question
I'd not exactly recommend casting the termination away from strings. ;)
#include<iostream>
#include<string>
int main () {
char x[] = { 'a', 'b', 'c', 0, 0, 0, 0, 0};
std::wstring ws = reinterpret_cast<wchar_t*>(x);
std::u16string u16s = reinterpret_cast<char16_t*>(x);
std::cout << "sizeof(wchar_t): " << sizeof(wchar_t)
<< "\twide string length: " << ws.length()
<< std::endl;
std::cout << "sizeof(char16_t): " << sizeof(char16_t)
<< "\tu16string length: " << u16s.length()
<< std::endl;
}
output (compiled with g++)
sizeof(wchar_t): 4 wide string length: 1
sizeof(char16_t): 2 u16string length: 2
As expected, isn't it.
Upvotes: 1
Reputation: 171383
No, you have created an array of four elements, the first element is 'a'
converted to char16_t
, the second is 'b'
converted to char16_t
etc.
Then you create a u16string
from that array (converted to a pointer), which reads each element up to the null terminator.
Upvotes: 4