RikSaunderson
RikSaunderson

Reputation: 3765

C++: Convert occurrences of hexadecimal ascii values in a string to normal chars

I have a std::string which contains letters and numbers as normal, but all punctuation (e.g. brackets, square brackets, commas and colons) are in their ASCII representation (e.g. %28, %29, %2C and %3A).

What would be the fastest way to parse my string, leave 'normal' characters alone and convert hexadecimal values to their ASCII representations?

Upvotes: 1

Views: 711

Answers (3)

Kerrek SB
Kerrek SB

Reputation: 477010

Here's an in-place version:

void unescape(std::string & s)
{
    for (std::size_t r = 0, w = 0; r != s.size(); )
    {
        char digit;

        if (s[r] != '%')
        {
            ++r;
            ++w;
        }
        else if (r + 1 < s.size() && s[r + 1] == '%')
        {
            r += 2;
            ++w;
        }
        else if (r + 2 < s.size()) && is_hex(s, r + 1, digit))
        {
            s[w] = digit;
            ++w;
            r += 3;
        }
        else
        {
            // error, throw exception?
        }
    }

    s.erase(s.begin() + r, s.end());
}

bool is_hex(std::string const & s, std::size_t offset, char & result)
{
    unsigned char d1, d2;
    if (hex_digit(s[offset], d1) && hex_digit(s[offset + 1], d2))
    {
        result = d1 * 16 + d2;
        return true;
    }
    return false;
}

bool hex_digit(char c, unsigned char & value)
{
    if (c >= '0' && c <= '9') { value = c - '0'; return true; }

    if (c >= 'a' && c <= 'f') { value = c - 'a' + 10; return true; }

    if (c >= 'A' && c <= 'F') { value = c - 'A' + 10; return true; }

    return false;
}

Upvotes: 0

perreal
perreal

Reputation: 97948

libcurl has the function curl_easy_unescape:

char *curl_easy_unescape( CURL * curl , char * url , 
        int inlength , int * outlength );

This function converts the given URL encoded input string to a "plain string" and returns that in an allocated memory area. All input characters that are URL encoded (%XX where XX is a two-digit hexadecimal number) are converted to their binary versions.

Upvotes: 1

Some programmer dude
Some programmer dude

Reputation: 409166

You could use e.g. the find function to search for the '%' character. If the next two characters are hexadecimal digits then replace the three characters with the actual character. Do all this in a loop while you find '%'.

Instead of doing in-place replacement, you could iterate over the string, appending normal characters to another string, and when you reach a '%' you check that it's a valid URL escape, and append the proper character to the output string.

Upvotes: 2

Related Questions