eeshwr
eeshwr

Reputation: 268

How does HttpUitlity.UrlDecode(string) works?

I am using HttpUtility.UrlDecode(string). But when ever i try to decode the the string "%ab" it returns "�" character which creates a problem.

Upvotes: 0

Views: 180

Answers (3)

aevitas
aevitas

Reputation: 3833

You can find the implementation of the method on the reference source page, it essentially performs a per-character validation of the specified URL, and converts them as necessary.

The problem you are facing right now will most likely have to do with the encoding of your output string. The character returned by UrlDecode may return a char that isn't supported by the encoding you're displaying your string in, resulting in a "weird" char.

For the sake of completeness, here is the entire method:

    internal string UrlDecode(string value, Encoding encoding) {
        if (value == null) {
            return null;
        }

        int count = value.Length;
        UrlDecoder helper = new UrlDecoder(count, encoding);

        // go through the string's chars collapsing %XX and %uXXXX and
        // appending each char as char, with exception of %XX constructs
        // that are appended as bytes

        for (int pos = 0; pos < count; pos++) {
            char ch = value[pos];

            if (ch == '+') {
                ch = ' ';
            }
            else if (ch == '%' && pos < count - 2) {
                if (value[pos + 1] == 'u' && pos < count - 5) {
                    int h1 = HttpEncoderUtility.HexToInt(value[pos + 2]);
                    int h2 = HttpEncoderUtility.HexToInt(value[pos + 3]);
                    int h3 = HttpEncoderUtility.HexToInt(value[pos + 4]);
                    int h4 = HttpEncoderUtility.HexToInt(value[pos + 5]);

                    if (h1 >= 0 && h2 >= 0 && h3 >= 0 && h4 >= 0) {   // valid 4 hex chars
                        ch = (char)((h1 << 12) | (h2 << 8) | (h3 << 4) | h4);
                        pos += 5;

                        // only add as char
                        helper.AddChar(ch);
                        continue;
                    }
                }
                else {
                    int h1 = HttpEncoderUtility.HexToInt(value[pos + 1]);
                    int h2 = HttpEncoderUtility.HexToInt(value[pos + 2]);

                    if (h1 >= 0 && h2 >= 0) {     // valid 2 hex chars
                        byte b = (byte)((h1 << 4) | h2);
                        pos += 2;

                        // don't add as char
                        helper.AddByte(b);
                        continue;
                    }
                }
            }

            if ((ch & 0xFF80) == 0)
                helper.AddByte((byte)ch); // 7 bit have to go as bytes because of Unicode
            else
                helper.AddChar(ch);
        }

        return Utf16StringValidator.ValidateString(helper.GetString());
    }

Upvotes: 0

CoBolt
CoBolt

Reputation: 442

https://msdn.microsoft.com/en-us/library/adwtk1fy(v=vs.110).aspx

Converts a string that has been encoded for transmission in a URL into a decoded string.

Url encoding reference: http://www.w3schools.com/tags/ref_urlencode.asp

Its most likely a UTF8 URL you're trying to decode and '%ab% doesnt reference anything - that's why you're getting the '�'-character. It doesnt know what character to decode this as.

If you try to decode something like this: 'this%20is%20a%20text' it will return: 'this is a text' because %20 = 'space'-character

Upvotes: 1

Ivan Vasiljevic
Ivan Vasiljevic

Reputation: 5708

If you look on this link you can see that you can send encoding as parameter to function. I would play with this, most likely, encoding of string that you are getting from function is not UTF-8.

Upvotes: 1

Related Questions