Chris
Chris

Reputation: 31206

C++: Optimal way to replace control characters in a string

For a parsing job, I have a string that can be essentially anything. ' For instance:

"something \t \n \0 \whatever else"

At the end of the parse, I need to serialize the output to easy-to-work with JSON...which means I need to get rid of the control characters. For all value entries, I run a string sanitizer:

void sanitizer(std::string & value){
   for (auto& it : value){
     if ((int) sit <= 31 || (int) sit == 127){
        if (sit == '\t')
            std::cout << "\\t";
        else if (sit == '\r')
            std::cout << "\\r";
        else if (sit == '\0')
            std::cout << "\\0";
        else if (sit == '\n')
            std::cout << "\\n";

        else 
            std::cout << " ";
      } else if (sit == '"'){
          std::cout << '\'';
      } else if (sit == '\\')
          std::cout << "/";
      else 
          std::cout << sit;
    }

However, this function, alone, occupies about 44% of the time in the parser.

When I eliminate the std::cout calls, and instead build a string, then print to cout, this slows things down further.


Is there an optimized way to replace/escape these control characters in a string with C++?

Upvotes: 2

Views: 1578

Answers (2)

Aaron S
Aaron S

Reputation: 5323

Would something like below work for your purposes?

void sanitizer(std::string & value) {
    std::string prev_loc = std::setlocale(LC_ALL, nullptr);
    std::setlocale(LC_ALL, "en_US.iso88591");
    std::replace_if(value.begin(), value.end(), [](unsigned char c){ return std::iscntrl(c); }, ' ');
    std::setlocale(LC_ALL, prev_loc.c_str());
}

Upvotes: 0

Maxim Egorushkin
Maxim Egorushkin

Reputation: 136238

One way is to use std::iscntrl function along with std::remove_if:

void remove_control_characters(std::string& s) {
    s.erase(std::remove_if(s.begin(), s.end(), [](char c) { return std::iscntrl(c); }), s.end());
}

Further improvement would be to implement your own character classification function. std::iscntrl uses the current global locale object for that.

Upvotes: 5

Related Questions