Reputation: 31206
For a parsing job, I have a string that can be essentially anything. ' For instance:
"something \t \n \0 \whatever else"
At the end of the parse, I need to serialize the output to easy-to-work with JSON...which means I need to get rid of the control characters. For all value entries, I run a string sanitizer:
void sanitizer(std::string & value){
for (auto& it : value){
if ((int) sit <= 31 || (int) sit == 127){
if (sit == '\t')
std::cout << "\\t";
else if (sit == '\r')
std::cout << "\\r";
else if (sit == '\0')
std::cout << "\\0";
else if (sit == '\n')
std::cout << "\\n";
else
std::cout << " ";
} else if (sit == '"'){
std::cout << '\'';
} else if (sit == '\\')
std::cout << "/";
else
std::cout << sit;
}
However, this function, alone, occupies about 44% of the time in the parser.
When I eliminate the std::cout
calls, and instead build a string, then print to cout
, this slows things down further.
Is there an optimized way to replace/escape these control characters in a string with C++?
Upvotes: 2
Views: 1578
Reputation: 5323
Would something like below work for your purposes?
void sanitizer(std::string & value) {
std::string prev_loc = std::setlocale(LC_ALL, nullptr);
std::setlocale(LC_ALL, "en_US.iso88591");
std::replace_if(value.begin(), value.end(), [](unsigned char c){ return std::iscntrl(c); }, ' ');
std::setlocale(LC_ALL, prev_loc.c_str());
}
Upvotes: 0
Reputation: 136238
One way is to use std::iscntrl
function along with std::remove_if
:
void remove_control_characters(std::string& s) {
s.erase(std::remove_if(s.begin(), s.end(), [](char c) { return std::iscntrl(c); }), s.end());
}
Further improvement would be to implement your own character classification function. std::iscntrl
uses the current global locale object for that.
Upvotes: 5