Reputation: 1175
how do i replace each occurrence of a specific ascii character in a std::string with a unicode character?
im trying (using em dash as an example)
string mystring;
replace(mystring.begin(), mystring.end(), ' ', '—'); // error: 2nd char is too wide for char
replace(mystring.begin(), mystring.end(), " ", "—"); // error: replace() does not exist
i could of course write a loop, but i was hoping for there to be a single standard function available for this. im aware that the modified string will be longer that the original string.
seems like a silly basic problem, but 1 hour of googling solved zilch.
Upvotes: 0
Views: 688
Reputation: 596497
std::string
only knows about arbitrary char
elements, but not what those char
s actually represent. It is your responsibility to decide what charset the std::string
's content will be encoded as, and then encode the Unicode character in that same charset. For example, in UTF-8, —
(U+2014 EM DASH) is 3 char
s: 0xE2 0x80 0x94
, but in Windows-125x charsets it is only 1 char
: 0x97
.
You can use the std::string::find()
method to find the index of the 1-char
ASCII character, and then use the std::string::replace()
method to substitute in the char
-encoded Unicode character, eg:
string mystring = ...;
string replacement = ...; // "\xE2\x80\x94", "\x97", etc...
string::size_type pos = 0;
while ((pos = mystring.find(' ', pos)) != string::npos) {
mystring.replace(pos, 1, replacement);
pos += replacement.size();
}
Upvotes: 1
Reputation: 1175
boost, oh yeah, does the trick
#include <boost/algorithm/string/replace.hpp>
...
boost::replace_all(mystring, " ", "—");
https://www.boost.org/doc/libs/1_47_0/doc/html/boost/algorithm/ireplace_all.html
alternatively (although verbose) using only the standard library:
string tmp;
std::regex_replace(back_inserter(tmp), mystring.begin(), mystring.end(), std::regex(" "), "—");
mystring = tmp;
Upvotes: 0