ireneisgood
ireneisgood

Reputation: 13

How to properly convert std::string to an integer vector

My high level goal is to convert any string (can include non-ascii characters) into a vector of integers by converting each character to integer.

I already have a python code snippet for this purpose:

bytes = list(text.encode())

Now I want to have a C++ equivalent. I tried something like

int main() {
  char const* bytes = inputText.c_str();
  long bytesLen = strlen(bytes);
  auto vec = std::vector<long>(bytes, bytes + bytesLen);
  for (auto number : vec) {
      cout << number << endl;
  }
  return 0;
}

For an input string like "testΔ", the python code outputs [116, 101, 115, 116, 206, 148].

However C++ code outputs [116, 101, 115, 116, -50, -108].

How should I change the C++ code to make them consistent?

Upvotes: 1

Views: 204

Answers (3)

jhkouy78reu9wx
jhkouy78reu9wx

Reputation: 342

You can iterate over std::string contents just fine, no need to convert it to std::vector. Try this:

int main()
{
    std::string str = "abc";
    for (auto c : str)
    {
        std::cout << static_cast<unsigned int>(c) << std::endl;
    }
}

static_cast here is needed just because standard operator<< outputs char as it is, not as a number. Otherwise, you can work with it just like with any other integral type. We cast it to unsigned int to ensure that output is strictly positive, for signedness of char is implementation-defined.

Upvotes: 1

Karl Knechtel
Karl Knechtel

Reputation: 61519

However C++ code outputs [116, 101, 115, 116, -50, -108].

In C++, the char type is separate from both signed char and unsigned char, and it is unspecified whether or not it should be signed.

You thus explicitly want an unsigned char*, but the .c_str method gives you char *, so you need to cast. You will need reinterpret_cast or a C-style cast; static_cast will not work.

Upvotes: 2

eerorika
eerorika

Reputation: 238351

How should I change the C++ code to make them consistent?

The difference appears to be that Python uses unsigned char values while char is signed in your C++ implementation. One solution: Reinterpret the string as array of unsigned char.

Upvotes: 0

Related Questions