Reputation: 43
When writing a program, I'm having issues working with a combination of special characters and regular ones. When I print either type to the console separately, they work fine, but when I print a special and normal character in the same line, it results in errored characters instead of the expected output. My code:
#include <fstream>
#include <iostream>
#include <string>
using namespace std;
void initCharacterMap(){
const string normal = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890!@#$%^&*()-_[]{};':\",.<>/?";
const string inverse = "∀𐐒Ↄ◖ƎℲ⅁HIſ⋊⅂WᴎOԀΌᴚS⊥∩ᴧMX⅄Zɐqɔpǝɟƃɥıɾʞʃɯuodbɹsʇnʌʍxʎz12Ɛᔭ59Ɫ860¡@#$%^⅋*)(-‾][}{؛,:„'˙></¿";
cout << normal << endl;
for(int i=0;i<normal.length();i++){
cout << normal[i];
}
cout << endl;
cout << inverse << endl;
for(int i=0;i<inverse.length();i++){
cout << inverse[i];
}
cout << endl;
for(int i=0;i<inverse.length();i++){
cout << normal[i] << inverse[i] << endl;
}
}
int main() {
initCharacterMap();
return 0;
}
And the console output: https://paste.ubuntu.com/p/H9bqh67WPZ/
When viewed in console, the \XX characters show up as unknown character symbol, and when I opened that log, I was warned that some characters couldn't be viewed and that editing could corrupt the file.
If anyone has any advice on how I can fix this, it would be greatly appreciated.
EDIT: After following the suggestion in Marek R's answer, the situation greatly improved, but this still isn't quite giving me the results I want. New code:
#include <fstream>
#include <iostream>
#include <string>
using namespace std;
void initCharacterMap(){
const wchar_t normal[] = L"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890!@#$%^&*()-_[]{};':\",.<>/?";
const wchar_t inverse[] = L"∀𐐒Ↄ◖ƎℲ⅁HIſ⋊⅂WᴎOԀΌᴚS⊥∩ᴧMX⅄Zɐqɔpǝɟƃɥıɾʞʃɯuodbɹsʇnʌʍxʎz12Ɛᔭ59Ɫ860¡@#$%^⅋*)(-‾][}{؛,:„'˙></¿";
wcout << normal << endl;
for(int i=0;i<sizeof(normal)/sizeof(normal[0]);i++){
wcout << normal[i];
}
wcout << endl;
wcout << inverse << endl;
for(int i=0;i<sizeof(inverse)/sizeof(inverse[0]);i++){
wcout << inverse[i];
}
wcout << endl;
for(int i=0;i<sizeof(inverse)/sizeof(inverse[0]);i++){
wcout << normal[i] << inverse[i] << endl;
}
}
int main() {
initCharacterMap();
return 0;
}
New console output: https://paste.ubuntu.com/p/hcM7JB99zj/
So, I'm no longer having issues with using output of contents of the strings together, but the issue with it now is that all non-ascii characters are being replaced with question marks in the output. Is there any way to make those characters output properly?
Upvotes: 3
Views: 3345
Reputation: 37657
Most probably you code is using UTF-8 encoding. This means that single character can occupy from one to 4 bytes.
Note that that value of inverse.size()
is bigger than you are expecting.
std::string
doesn't know anything about encoding, so it treats each byte as a character. The output console is interpreting sequence of byres as done in respective encoding and shows proper characters.
When you print byte by byte each string separately it works since sequence is proper. When you print one byte from one string and one byte from other things get messy.
The easiest way to fix it is use std::wstring
wchar_t
and L"some literal"
. It should work in your case, but as point out in comets below on some platforms some characters may not fit into single wide character.
If you want to know more read about different character encoding.
The other way to solve your problem is to use a map which will transform sequence of bytes (string) to other sequence (string). C++11:
auto dictionary = std::unordered_map<std::string, std::string> {
{ "A", "∀" },
{ "B", "𐐒" },
{ "C", "Ↄ" },
{ "D", "◖" },
… … …
}
On my mac (with polish locale), when building with clang, application ignores inverted
values (wcout
goes into invalid state), but when locale is set everything works like you are expecting.
#include <fstream>
#include <iostream>
#include <string>
#include <locale>
using namespace std;
void initCharacterMap(){
wcout.imbue(locale(""));
const auto normal = L"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890!@#$%^&*()-_[]{};':\",.<>/?"s;
const auto inverse = L"∀𐐒Ↄ◖ƎℲ⅁HIſ⋊⅂WᴎOԀΌᴚS⊥∩ᴧMX⅄Zɐqɔpǝɟƃɥıɾʞʃɯuodbɹsʇnʌʍxʎz12Ɛᔭ59Ɫ860¡@#$%^⅋*)(-‾][}{؛,:„'˙></¿"s;
wcout << normal << endl;
for(auto ch : normal){
wcout << ch;
}
wcout << endl;
wcout << inverse << endl;
for(auto ch : inverse){
wcout << ch;
}
wcout << endl;
for(size_t i=0; i < inverse.length(); ++i){
wcout << normal[i] << inverse[i] << endl;
}
}
int main() {
initCharacterMap();
return 0;
}
https://wandbox.org/permlink/nTYi5RbZgZXclE5r
I'm suspecting that standard library in your compiler also doesn't know how to perform conversion with default locale, so it prints question marks instead actual charters. So add this two lines (include
and imbue
) and it should work. If not then provide information about your platform and compiler.
Upvotes: 2