Reputation: 17080
I search the web alot and didn't find c++ function that replace xml Special Character with their escape sequence? Is there something like this?
I know about the following:
Special Character Escape Sequence Purpose
& & Ampersand sign
' ' Single quote
" " Double quote
> > Greater than
< < Less than
is there more? what about writing hexadecimal value like 0×00, Is this also a problem?
Upvotes: 8
Views: 15183
Reputation: 81
These types of functions should be standard and we should never have to rewrite them. If you are using VS, have a look at atlenc.h This file is part of the VS installation. Inside the file there is a function called EscapeXML which is much more complete then any of the examples above.
Upvotes: 8
Reputation: 950
I slightly modified Ferruccio's solution to also eliminate the other characters that are in the way, such as anything < 0x20 and so on (found somewhere on the Internet). Tested and working.
void strip_tags(string* s) {
regex kj("</?(.*)>");
*s = regex_replace(*s, kj, "", boost::format_all);
std::map<char, std::string> transformations;
transformations['&'] = std::string("& ");
transformations['\''] = std::string("' ");
transformations['"'] = std::string("" ");
transformations['>'] = std::string("> ");
transformations['<'] = std::string("< ");
// Build list of characters to be searched for.
//
std::string reserved_chars;
for ( std::map<char, std::string>::iterator ti = transformations.begin(); ti != transformations.end(); ti++)
{
reserved_chars += ti->first;
}
size_t pos = 0;
while (std::string::npos != (pos = (*s).find_first_of(reserved_chars, pos)))
{
s->replace(pos, 1, transformations[(*s)[pos]]);
pos++;
}
}
string removeTroublesomeCharacters(string inString)
{
if (inString.empty()) return "";
string newString;
char ch;
for (int i = 0; i < inString.length(); i++)
{
ch = inString[i];
// remove any characters outside the valid UTF-8 range as well as all control characters
// except tabs and new lines
if ((ch < 0x00FD && ch > 0x001F) || ch == '\t' || ch == '\n' || ch == '\r')
{
newString.push_back(ch);
}
}
return newString;
So in this case, there are two functions. We can get the result with something like:
string StartingString ("Some_value");
string FinalString = removeTroublesomeCharacters(strip_tags(&StartingString));
Hope it helps!
(Oh yeah: credit for the other function goes to the author of the answer here: How do you remove invalid hexadecimal characters from an XML-based data source prior to constructing an XmlReader or XPathDocument that uses the data? )
Upvotes: 1
Reputation: 100668
Writing your own is easy enough, but scanning the string multiple times to search/replace individual characters can be inefficient:
std::string escape(const std::string& src) {
std::stringstream dst;
for (char ch : src) {
switch (ch) {
case '&': dst << "&"; break;
case '\'': dst << "'"; break;
case '"': dst << """; break;
case '<': dst << "<"; break;
case '>': dst << ">"; break;
default: dst << ch; break;
}
}
return dst.str();
}
Note: I used a C++11 range-based for loop for convenience, but you can easily do the same thing with an iterator.
Upvotes: 11
Reputation: 121971
As has been stated, it would be possible to write your own. For example:
#include <iostream>
#include <string>
#include <map>
int main()
{
std::string xml("a < > & ' \" string");
std::cout << xml << "\n";
// Characters to be transformed.
//
std::map<char, std::string> transformations;
transformations['&'] = std::string("&");
transformations['\''] = std::string("'");
transformations['"'] = std::string(""");
transformations['>'] = std::string(">");
transformations['<'] = std::string("<");
// Build list of characters to be searched for.
//
std::string reserved_chars;
for (auto ti = transformations.begin(); ti != transformations.end(); ti++)
{
reserved_chars += ti->first;
}
size_t pos = 0;
while (std::string::npos != (pos = xml.find_first_of(reserved_chars, pos)))
{
xml.replace(pos, 1, transformations[xml[pos]]);
pos++;
}
std::cout << xml << "\n";
return 0;
}
Output:
a < > & ' " string
a < > & ' " string
Add an entry into transformations
to introduce new transformations.
Upvotes: 6
Reputation: 40055
It appears that you want to generate XML yourself. I think you'll need to be a lot clearer, and read up on the XML specification if you want to be successful. Those are the only XML special characters, you say "I know there is more special character, lke foreign languages and currency signs"... these are not defined in XML, unless you mean by encoding as codepoints (£ for example) . Are you thinking HTML, or some other DTD?
The only way to avoid double encoding is to only encode things once. If you get the string ">", how do you know if it's already encoded and I wanted to represent the string ">", or I want to represent the string ">".
The best way is to represent your XML as a DOM (with strings as un-encoded strings), and use and XML serialiser like Xerces
Oh, and remember there's no way to represent characters under 0x20 in XML (apart from &x9;, &xA; and &xD; - whitespace).
Upvotes: 0
Reputation: 117701
There is a function, I namely just wrote it:
void replace_all(std::string& str, const std::string& old, const std::string& repl) {
size_t pos = 0;
while ((pos = str.find(old, pos)) != std::string::npos) {
str.replace(pos, old.length(), repl);
pos += repl.length();
}
}
std::string escape_xml(std::string str) {
replace_all(str, std::string("&"), std::string("&"));
replace_all(str, std::string("'"), std::string("'"));
replace_all(str, std::string("\""), std::string("""));
replace_all(str, std::string(">"), std::string(">"));
replace_all(str, std::string("<"), std::string("<"));
return str;
}
Upvotes: 2