Reputation: 3765
I have a std::string which contains letters and numbers as normal, but all punctuation (e.g. brackets, square brackets, commas and colons) are in their ASCII representation (e.g. %28
, %29
, %2C
and %3A
).
What would be the fastest way to parse my string, leave 'normal' characters alone and convert hexadecimal values to their ASCII representations?
Upvotes: 1
Views: 711
Reputation: 477010
Here's an in-place version:
void unescape(std::string & s)
{
for (std::size_t r = 0, w = 0; r != s.size(); )
{
char digit;
if (s[r] != '%')
{
++r;
++w;
}
else if (r + 1 < s.size() && s[r + 1] == '%')
{
r += 2;
++w;
}
else if (r + 2 < s.size()) && is_hex(s, r + 1, digit))
{
s[w] = digit;
++w;
r += 3;
}
else
{
// error, throw exception?
}
}
s.erase(s.begin() + r, s.end());
}
bool is_hex(std::string const & s, std::size_t offset, char & result)
{
unsigned char d1, d2;
if (hex_digit(s[offset], d1) && hex_digit(s[offset + 1], d2))
{
result = d1 * 16 + d2;
return true;
}
return false;
}
bool hex_digit(char c, unsigned char & value)
{
if (c >= '0' && c <= '9') { value = c - '0'; return true; }
if (c >= 'a' && c <= 'f') { value = c - 'a' + 10; return true; }
if (c >= 'A' && c <= 'F') { value = c - 'A' + 10; return true; }
return false;
}
Upvotes: 0
Reputation: 97948
libcurl has the function curl_easy_unescape:
char *curl_easy_unescape( CURL * curl , char * url ,
int inlength , int * outlength );
This function converts the given URL encoded input string to a "plain string" and returns that in an allocated memory area. All input characters that are URL encoded (%XX where XX is a two-digit hexadecimal number) are converted to their binary versions.
Upvotes: 1
Reputation: 409166
You could use e.g. the find
function to search for the '%'
character. If the next two characters are hexadecimal digits then replace the three characters with the actual character. Do all this in a loop while you find '%'
.
Instead of doing in-place replacement, you could iterate over the string, appending normal characters to another string, and when you reach a '%'
you check that it's a valid URL escape, and append the proper character to the output string.
Upvotes: 2