mrchance
mrchance

Reputation: 1175

how to replace all occurrences (in a std::string) of a specific ascii char with a unicode char

how do i replace each occurrence of a specific ascii character in a std::string with a unicode character?

im trying (using em dash as an example)

string mystring;
replace(mystring.begin(), mystring.end(), ' ', '—'); // error: 2nd char is too wide for char
replace(mystring.begin(), mystring.end(), " ", "—"); // error: replace() does not exist

i could of course write a loop, but i was hoping for there to be a single standard function available for this. im aware that the modified string will be longer that the original string.

seems like a silly basic problem, but 1 hour of googling solved zilch.

Upvotes: 0

Views: 688

Answers (2)

Remy Lebeau
Remy Lebeau

Reputation: 596497

std::string only knows about arbitrary char elements, but not what those chars actually represent. It is your responsibility to decide what charset the std::string's content will be encoded as, and then encode the Unicode character in that same charset. For example, in UTF-8, (U+2014 EM DASH) is 3 chars: 0xE2 0x80 0x94, but in Windows-125x charsets it is only 1 char: 0x97.

You can use the std::string::find() method to find the index of the 1-char ASCII character, and then use the std::string::replace() method to substitute in the char-encoded Unicode character, eg:

string mystring = ...;
string replacement = ...; // "\xE2\x80\x94", "\x97", etc...
string::size_type pos = 0;
while ((pos = mystring.find(' ', pos)) != string::npos) {
    mystring.replace(pos, 1, replacement);
    pos += replacement.size();
}

Upvotes: 1

mrchance
mrchance

Reputation: 1175

boost, oh yeah, does the trick

#include <boost/algorithm/string/replace.hpp>
...
boost::replace_all(mystring, " ", "—");

https://www.boost.org/doc/libs/1_47_0/doc/html/boost/algorithm/ireplace_all.html

alternatively (although verbose) using only the standard library:

string tmp;
std::regex_replace(back_inserter(tmp), mystring.begin(), mystring.end(), std::regex(" "), "—");
mystring = tmp;

Upvotes: 0

Related Questions