Samir
Samir

Reputation: 4193

C++ Convert string (or char*) to wstring (or wchar_t*)

string s = "おはよう";
wstring ws = FUNCTION(s, ws);

How would i assign the contents of s to ws?

Searched google and used some techniques but they can't assign the exact content. The content is distorted.

Upvotes: 237

Views: 428250

Answers (20)

Johann Gerell
Johann Gerell

Reputation: 25581

NOTE! See Note (2023-10-05) at the bottom!

Assuming that the input string in your example (おはよう) is a UTF-8 encoded (which it isn't, by the looks of it, but let's assume it is for the sake of this explanation :-)) representation of a Unicode string of your interest, then your problem can be fully solved with the standard library (C++11 and newer) alone.

The TL;DR version:

#include <locale>
#include <codecvt>
#include <string>

std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
std::string narrow = converter.to_bytes(wide_utf16_source_string);
std::wstring wide = converter.from_bytes(narrow_utf8_source_string);

Longer online compilable and runnable example:

(They all show the same example. There are just many for redundancy...)

Note (old):

As pointed out in the comments and explained in https://stackoverflow.com/a/17106065/6345 there are cases when using the standard library to convert between UTF-8 and UTF-16 might give unexpected differences in the results on different platforms. For a better conversion, consider std::codecvt_utf8 as described on http://en.cppreference.com/w/cpp/locale/codecvt_utf8

Note (new):

Since the codecvt header is deprecated in C++17, some worry about the solution presented in this answer were raised. However, the C++ standards committee added an important statement in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0618r0.html saying

this library component should be retired to Annex D, along side , until a suitable replacement is standardized.

So in the foreseeable future, the codecvt solution in this answer is safe and portable.

Note (2023-10-05):

Proposal to remove the deprecated codecvt and wstring_convert in C++26:

Upvotes: 303

Chronial
Chronial

Reputation: 70653

utf-8 implementation

Assuming that your std::string is utf8-encoded, this is a platform-independent implementation of wstring-string conversion functions:

#include <codecvt>
#include <locale>
#include <string>
#include <type_traits>

std::string wstring_to_utf8(std::wstring const& str)
{
  std::wstring_convert<std::conditional<
        sizeof(wchar_t) == 4,
        std::codecvt_utf8<wchar_t>,
        std::codecvt_utf8_utf16<wchar_t>>::type> converter;
  return converter.to_bytes(str);
}

std::wstring utf8_to_wstring(std::string const& str)
{
  std::wstring_convert<std::conditional<
        sizeof(wchar_t) == 4,
        std::codecvt_utf8<wchar_t>,
        std::codecvt_utf8_utf16<wchar_t>>::type> converter;
  return converter.from_bytes(str);
}

The currently most upvoted answer looks similar, but produces incorrect results for non-BMP characters (i.e. Emojis 🚒) on non-Windows platforms. wchar_t is UTF-16 on windows, but UTF-32 everywhere else. The std::conditional takes care of that distinction.

MSVC Deprecation Warning

On msvc this might generate some deprecation warnings. You can disable these by wrapping the functions in

#pragma warning(push)
#pragma warning(disable : 4996)
<the two functions>
#pragma warning(pop)

Johann Gerell's answer explains why it's ok to disable that warning.

Getting utf-8 on msvc

Note that when you write a normal string in your source (like std::string s = "おはよう";), it won't be utf-8 encoded per default on msvc. I would strongly recommend setting your msvc character set to utf-8 to address this: https://learn.microsoft.com/en-us/cpp/build/reference/utf-8-set-source-and-executable-character-sets-to-utf-8?view=msvc-170

Upvotes: 1

Potatoswatter
Potatoswatter

Reputation: 137770

Your question is underspecified. Strictly, that example is a syntax error. However, std::mbstowcs is probably what you're looking for.

It is a C-library function and operates on buffers, but here's an easy-to-use idiom, courtesy of Mooing Duck:

std::wstring ws(s.size(), L' '); // Overestimate number of code points.
ws.resize(std::mbstowcs(&ws[0], s.c_str(), s.size())); // Shrink to fit.

Upvotes: 42

Michael Santos
Michael Santos

Reputation: 610

For me the most uncomplicated option without big overhead is:

Include:

#include <atlbase.h>
#include <atlconv.h>

Convert:

char* whatever = "test1234";
std::wstring lwhatever = std::wstring(CA2W(std::string(whatever).c_str()));

If needed:

lwhatever.c_str();

Upvotes: 6

Alen Wesker
Alen Wesker

Reputation: 245

You can use boost path or std path; which is a lot more easier. boost path is easier for cross-platform application

#include <boost/filesystem/path.hpp>

namespace fs = boost::filesystem;

//s to w
std::string s = "xxx";
auto w = fs::path(s).wstring();

//w to s
std::wstring w = L"xxx";
auto s = fs::path(w).string();

if you like to use std:

#include <filesystem>
namespace fs = std::filesystem;

//The same

c++ older version

#include <experimental/filesystem>
namespace fs = std::experimental::filesystem;

//The same

The code within still implement a converter which you dont have to unravel the detail.

Upvotes: 8

user8197171
user8197171

Reputation:

Here is my super basic solution that might not work for everyone. But would work for a lot of people.

It requires usage of the Guideline Support Library. Which is a pretty official C++ library that was designed by many C++ committee authors:

    std::string to_string(std::wstring const & wStr)
    {
        std::string temp = {};

        for (wchar_t const & wCh : wStr)
        {
            // If the string can't be converted gsl::narrow will throw
            temp.push_back(gsl::narrow<char>(wCh));
        }

        return temp;
    }

All my function does is allow the conversion if possible. Otherwise throw an exception.

Via the usage of gsl::narrow (https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#es49-if-you-must-use-a-cast-use-a-named-cast)

Upvotes: 2

vSzemkel
vSzemkel

Reputation: 692

std::string -> wchar_t[] with safe mbstowcs_s function:

auto ws = std::make_unique<wchar_t[]>(s.size() + 1);
mbstowcs_s(nullptr, ws.get(), s.size() + 1, s.c_str(), s.size());

This is from my sample code

Upvotes: 0

Kadir Erdem Demir
Kadir Erdem Demir

Reputation: 3595

If you have QT and if you are lazy to implement a function and stuff you can use

std::string str;
QString(str).toStdWString()

Upvotes: 4

lmiguelmh
lmiguelmh

Reputation: 3202

If you are using Windows/Visual Studio and need to convert a string to wstring you could use:

#include <AtlBase.h>
#include <atlconv.h>
...
string s = "some string";
CA2W ca2w(s.c_str());
wstring w = ca2w;
printf("%s = %ls", s.c_str(), w.c_str());

Same procedure for converting a wstring to string (sometimes you will need to specify a codepage):

#include <AtlBase.h>
#include <atlconv.h>
...
wstring w = L"some wstring";
CW2A cw2a(w.c_str());
string s = cw2a;
printf("%s = %ls", s.c_str(), w.c_str());

You could specify a codepage and even UTF8 (that's pretty nice when working with JNI/Java). A standard way of converting a std::wstring to utf8 std::string is showed in this answer.

// 
// using ATL
CA2W ca2w(str, CP_UTF8);

// 
// or the standard way taken from the answer above
#include <codecvt>
#include <string>

// convert UTF-8 string to wstring
std::wstring utf8_to_wstring (const std::string& str) {
    std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
    return myconv.from_bytes(str);
}

// convert wstring to UTF-8 string
std::string wstring_to_utf8 (const std::wstring& str) {
    std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
    return myconv.to_bytes(str);
}

If you want to know more about codepages there is an interesting article on Joel on Software: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets.

These CA2W (Convert Ansi to Wide=unicode) macros are part of ATL and MFC String Conversion Macros, samples included.

Sometimes you will need to disable the security warning #4995', I don't know of other workaround (to me it happen when I compiled for WindowsXp in VS2012).

#pragma warning(push)
#pragma warning(disable: 4995)
#include <AtlBase.h>
#include <atlconv.h>
#pragma warning(pop)

Edit: Well, according to this article the article by Joel appears to be: "while entertaining, it is pretty light on actual technical details". Article: What Every Programmer Absolutely, Positively Needs To Know About Encoding And Character Sets To Work With Text.

Upvotes: 26

Mark Lakata
Mark Lakata

Reputation: 20818

Here's a way to combining string, wstring and mixed string constants to wstring. Use the wstringstream class.

This does NOT work for multi-byte character encodings. This is just a dumb way of throwing away type safety and expanding 7 bit characters from std::string into the lower 7 bits of each character of std:wstring. This is only useful if you have a 7-bit ASCII strings and you need to call an API that requires wide strings.

#include <sstream>

std::string narrow = "narrow";
std::wstring wide = L"wide";

std::wstringstream cls;
cls << " abc " << narrow.c_str() << L" def " << wide.c_str();
std::wstring total= cls.str();

Upvotes: 23

Isma Rekathakusuma
Isma Rekathakusuma

Reputation: 1152

String to wstring

std::wstring Str2Wstr(const std::string& str)
{
    int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0);
    std::wstring wstrTo(size_needed, 0);
    MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed);
    return wstrTo;
}

wstring to String

std::string Wstr2Str(const std::wstring& wstr)
{
    typedef std::codecvt_utf8<wchar_t> convert_typeX;
    std::wstring_convert<convert_typeX, wchar_t> converterX;
    return converterX.to_bytes(wstr);
}

Upvotes: 4

Matthias Ronge
Matthias Ronge

Reputation: 10102

This variant of it is my favourite in real life. It converts the input, if it is valid UTF-8, to the respective wstring. If the input is corrupted, the wstring is constructed out of the single bytes. This is extremely helpful if you cannot really be sure about the quality of your input data.

std::wstring convert(const std::string& input)
{
    try
    {
        std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
        return converter.from_bytes(input);
    }
    catch(std::range_error& e)
    {
        size_t length = input.length();
        std::wstring result;
        result.reserve(length);
        for(size_t i = 0; i < length; i++)
        {
            result.push_back(input[i] & 0xFF);
        }
        return result;
    }
}

Upvotes: 12

TarmoPikaro
TarmoPikaro

Reputation: 5223

Based upon my own testing (On windows 8, vs2010) mbstowcs can actually damage original string, it works only with ANSI code page. If MultiByteToWideChar/WideCharToMultiByte can also cause string corruption - but they tends to replace characters which they don't know with '?' question marks, but mbstowcs tends to stop when it encounters unknown character and cut string at that very point. (I have tested Vietnamese characters on finnish windows).

So prefer Multi*-windows api function over analogue ansi C functions.

Also what I've noticed shortest way to encode string from one codepage to another is not use MultiByteToWideChar/WideCharToMultiByte api function calls but their analogue ATL macros: W2A / A2W.

So analogue function as mentioned above would sounds like:

wstring utf8toUtf16(const string & str)
{
   USES_CONVERSION;
   _acp = CP_UTF8;
   return A2W( str.c_str() );
}

_acp is declared in USES_CONVERSION macro.

Or also function which I often miss when performing old data conversion to new one:

string ansi2utf8( const string& s )
{
   USES_CONVERSION;
   _acp = CP_ACP;
   wchar_t* pw = A2W( s.c_str() );

   _acp = CP_UTF8;
   return W2A( pw );
}

But please notice that those macro's use heavily stack - don't use for loops or recursive loops for same function - after using W2A or A2W macro - better to return ASAP, so stack will be freed from temporary conversion.

Upvotes: 1

vladon
vladon

Reputation: 8401

using Boost.Locale:

ws = boost::locale::conv::utf_to_utf<wchar_t>(s);

Upvotes: 9

jaguar
jaguar

Reputation: 25

use this code to convert your string to wstring

std::wstring string2wString(const std::string& s){
    int len;
    int slength = (int)s.length() + 1;
    len = MultiByteToWideChar(CP_ACP, 0, s.c_str(), slength, 0, 0); 
    wchar_t* buf = new wchar_t[len];
    MultiByteToWideChar(CP_ACP, 0, s.c_str(), slength, buf, len);
    std::wstring r(buf);
    delete[] buf;
    return r;
}

int main(){
    std::wstring str="your string";
    std::wstring wStr=string2wString(str);
    return 0;
}

Upvotes: -1

Alex Che
Alex Che

Reputation: 7112

Windows API only, pre C++11 implementation, in case someone needs it:

#include <stdexcept>
#include <vector>
#include <windows.h>

using std::runtime_error;
using std::string;
using std::vector;
using std::wstring;

wstring utf8toUtf16(const string & str)
{
   if (str.empty())
      return wstring();

   size_t charsNeeded = ::MultiByteToWideChar(CP_UTF8, 0, 
      str.data(), (int)str.size(), NULL, 0);
   if (charsNeeded == 0)
      throw runtime_error("Failed converting UTF-8 string to UTF-16");

   vector<wchar_t> buffer(charsNeeded);
   int charsConverted = ::MultiByteToWideChar(CP_UTF8, 0, 
      str.data(), (int)str.size(), &buffer[0], buffer.size());
   if (charsConverted == 0)
      throw runtime_error("Failed converting UTF-8 string to UTF-16");

   return wstring(&buffer[0], charsConverted);
}

Upvotes: 25

Andreas Bonini
Andreas Bonini

Reputation: 44742

string s = "おはよう"; is an error.

You should use wstring directly:

wstring ws = L"おはよう";

Upvotes: -3

hahakubile
hahakubile

Reputation: 7552

method s2ws works well. Hope helps.

std::wstring s2ws(const std::string& s) {
    std::string curLocale = setlocale(LC_ALL, ""); 
    const char* _Source = s.c_str();
    size_t _Dsize = mbstowcs(NULL, _Source, 0) + 1;
    wchar_t *_Dest = new wchar_t[_Dsize];
    wmemset(_Dest, 0, _Dsize);
    mbstowcs(_Dest,_Source,_Dsize);
    std::wstring result = _Dest;
    delete []_Dest;
    setlocale(LC_ALL, curLocale.c_str());
    return result;
}

Upvotes: 1

Ghominejad
Ghominejad

Reputation: 1798

From char* to wstring:

char* str = "hello worlddd";
wstring wstr (str, str+strlen(str));

From string to wstring:

string str = "hello worlddd";
wstring wstr (str.begin(), str.end());

Note this only works well if the string being converted contains only ASCII characters.

Upvotes: 20

Pietro M
Pietro M

Reputation: 1939

int StringToWString(std::wstring &ws, const std::string &s)
{
    std::wstring wsTmp(s.begin(), s.end());

    ws = wsTmp;

    return 0;
}

Upvotes: 59

Related Questions