ccpizza
ccpizza

Reputation: 31801

Java URI escaper which works like the Javascript's unescape

I have a string like http://google.com/search/q=<%= name %>.

A third party js library that I have no control over is escaping this to "http://google.com/search/q=%3C%=%20name%20%%3E"

which Javascript can successfully unescape to the original string with

unescape("http://google.com/search/q=%3C%=%20name%20%%3E")

But Java's URLDecode.decode("http://google.com/search/q=%3C%=%20name%20%%3E") throws an IllegalArgumentException because of the unescaped literal % character in the string which is of course correct and according to spec, but this makes server-side processing complicated.

Before I try to fix the bad JS escape on the server-side with regular expressions (because, as mentioned, I cannot modify the JS side), I would like to know if there is a more permissive Java URL/URI decoding API which would work in the same way as Javascript's unescape, i.e. which would ignore standalone '%' characters and only decode whatever is decodable.

Upvotes: 2

Views: 148

Answers (1)

Evan Jones
Evan Jones

Reputation: 886

I had a quick look around some Apache libraries and came up against the same issue. Interestingly enough when I followed up in the EMCAScript Language Spec, I found pseudo code for the unescape() function. You can see this at https://tc39.github.io/ecma262/#sec-unescape-string

It's easy enough to put together a simplistic implementation of this (see below) and at least for the example in your question the output matches.

Now this code is in no way optimized and I haven't though about whether character encoding is relevant, but it may be a less painful way forward than trying to wrestle things out with Regex.

public static String unescape(String s) {
    StringBuilder r = new StringBuilder();
    for (int i = 0; i < s.length();) {
        if (s.charAt(i) == '%') {
            if (looksLikeUnicode(s, i)) {
                r.append((char) fromHex(s, i + 2, i + 5));
                i += 6;
                continue;
            }
            if (looksLikeAscii(s, i)) {
                r.append((char) fromHex(s, i + 1, i + 2));
                i += 3;
                continue;
            }
        }
        r.append(s.charAt(i));
        i += 1;
    }
    return r.toString();
}

private static boolean looksLikeUnicode(String s, int i) {
    return (i + 5 < s.length()) && (s.charAt(i + 1) == 'u') && areHexDigits(s, i + 2, i + 5);
}

private static boolean looksLikeAscii(String s, int i) {
    return (i + 2 < s.length()) && areHexDigits(s, i + 1, i + 2);
}

private static boolean areHexDigits(String s, int from, int to) {
    for (int i = from; i <= to; ++i) {
        if (isNotHexDigit(s.charAt(i))) {
            return false;
        }
    }
    return true;
}

private static boolean isHexDigit(char c) {
    return (c >= '0' && c <= '9') || (c >= 'A' && c <= 'F') || (c >= 'a' && c <= 'f');
}

private static boolean isNotHexDigit(char c) {
    return !isHexDigit(c);
}

private static int fromHex(String s, int from, int to) {
    return Integer.parseInt(s.substring(from, to + 1), 16);
}

Upvotes: 1

Related Questions