Reputation: 13
My high level goal is to convert any string (can include non-ascii characters) into a vector of integers by converting each character to integer.
I already have a python code snippet for this purpose:
bytes = list(text.encode())
Now I want to have a C++ equivalent. I tried something like
int main() {
char const* bytes = inputText.c_str();
long bytesLen = strlen(bytes);
auto vec = std::vector<long>(bytes, bytes + bytesLen);
for (auto number : vec) {
cout << number << endl;
}
return 0;
}
For an input string like "testΔ", the python code outputs [116, 101, 115, 116, 206, 148].
However C++ code outputs [116, 101, 115, 116, -50, -108].
How should I change the C++ code to make them consistent?
Upvotes: 1
Views: 204
Reputation: 342
You can iterate over std::string
contents just fine, no need to convert it to std::vector
. Try this:
int main()
{
std::string str = "abc";
for (auto c : str)
{
std::cout << static_cast<unsigned int>(c) << std::endl;
}
}
static_cast
here is needed just because standard operator<<
outputs char
as it is, not as a number. Otherwise, you can work with it just like with any other integral type. We cast it to unsigned int
to ensure that output is strictly positive, for signedness of char
is implementation-defined.
Upvotes: 1
Reputation: 61519
However C++ code outputs [116, 101, 115, 116, -50, -108].
In C++, the char
type is separate from both signed char
and unsigned char
, and it is unspecified whether or not it should be signed.
You thus explicitly want an unsigned char*
, but the .c_str
method gives you char *
, so you need to cast. You will need reinterpret_cast
or a C-style cast; static_cast
will not work.
Upvotes: 2
Reputation: 238351
How should I change the C++ code to make them consistent?
The difference appears to be that Python uses unsigned char values while char
is signed in your C++ implementation. One solution: Reinterpret the string as array of unsigned char
.
Upvotes: 0