Akki
Akki

Reputation: 823

What characters should I encode always?

Trying to encode the URL before the HTTP call but getting no response from the server (response code 0). Without encoding things working fine.

Requirement:

Need to encode special characters in URL. Specifically, I need to encode the below characters plus any other such characters:

! * ' ( ) ; : @ & = + $ , / ? % # [ ]

This is my understanding of Url Encoding:

Only alphanumeric and a few special characters ("-", ".", "_", "~")  may be used unencoded within a URL. The rest of the characters transmitted as part of the URL, whether in the query string or path segment, must always be encoded. Reference.

I am doubtful on "*", as in Android this can also go unencoded. Reference

I tried encoding the whole URL, as well as individual params, but I'm getting the same result.

Below is my encoding function:

void Url::escape(String& str)
{
    if(str.isEmpty())
    {
        return;
    }
    uint32_t i = 0;
    unsigned char in;
    char hexbuf[4];

    while(i < str.size())
    {
        in = static_cast<unsigned char>(str[i]);

        switch(in)
        {
            case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': case 'g': case 'h': case 'i': case 'j': case 'k': case 'l': case 'm': case 'n': case 'o': case 'p': case 'q': case 'r': case 's': case 't': case 'u': case 'v': case 'w': case 'x': case 'y': case 'z': case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': case 'G': case 'H': case 'I': case 'J': case 'K': case 'L': case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R': case 'S': case 'T': case 'U': case 'V': case 'W': case 'X': case 'Y': case 'Z': case '_': case '~': case '.': case '-':
                break;
            default:
                str[i] = '%';
                snprintf(hexbuf, 3, "%02X", in);
                str.insert(hexbuf, i + 1);
                i += 2;
                break;
        }
        ++i;
    }
}

Am I missing anything here? I want clarity on below as well:

  1. Can I call Url::escape() on the whole input url String, or should I encode individual query params only?

  2. What characters can/should I omit in encoding (is Url::escape() correct or not)?

I referred to multiple references here, but I could not find any concrete solution.

Upvotes: 0

Views: 137

Answers (1)

Remy Lebeau
Remy Lebeau

Reputation: 596111

  1. Can I call Url::escape() on whole input url String or should I go by encoding indivisual query param only?

You must encode individual components of the url, not just the query params.

  1. What characters I can/should omit in encoding (is Url::escape() is correct or not)?

Different components have different rules of what characters need to be encoded and which ones do not.

RFC 3986: Uniform Resource Identifier (URI): Generic Syntax outlines the basic rules that all URLs have to conform to. However, there are lots of URL schemes, many of which provide additional rules/restrictions on top of this syntax for scheme-specific components, and have their own RFCs.

Upvotes: 2

Related Questions