merours
merours

Reputation: 4106

Interpreting escaped unicode chars in C++

Let's say I have a file called test.txt containing this text :

\u003cdiv style=\"text-align: left;\" trbidi=\"on\"\u003e\nAppending is not creating

If I want to read it char by char, here is what it looks like :

Code 1 : reading from file

ifstream file;
file.open("test.txt");
string line;
while (getline(file, line)){
    cout << line<< endl; // prints \u003cdiv style=\"text-align: left;\" trbidi=\"on\"\u003e\nAppending is not creating
}

However, if I declare the same string inside the code, escaped characters are detected and printed as such.

Code 2 : simple string

string line2 ("\u003cdiv style=\"text-align: left;\" trbidi=\"on\"\u003e\nAppending is 
cout << line2<< endl; // prints <div style="text-align: left;" trbidi="on">\n   Appending is not creating

This is perfectly normal since \ is the escape char for C++ strings.

This raises a question : is it possible to get the same result with the first code (ie, to have each line interpreted like the string defined in code 2) ?

Upvotes: 1

Views: 196

Answers (1)

Aerlevsedi
Aerlevsedi

Reputation: 36

The STL class string doesn't have any function for doing this. You would have to implement your own function to change the escaped characters expressions to its correct escaped character. This function would look like this:

string parse_escaped_characters(string s) {
    string s2;
    for (int i = 0; i < s.size(); ++i) {
        if (s[i] == '\\') {
            switch (s[i+1]) {
                case 'n': s2 += '\n'; ++i; break;
                case '\"': s2 += '\"'; ++i; break;
                //and so on...
            }
        }
        else s2 += s[i];
    }
    return s2;
 }

Or you could search for a library to treat strings which includes this functionality.

Upvotes: 2

Related Questions