Rodrigo
Rodrigo

Reputation: 4395

How to detect if a string is encoded with escape() or encodeURIComponent()

I have a web service that receives data from various clients. Some of them sends the data encoded using escape(), while the others instead use encodeURIComponent(). Is there a way to detect the encoding used to escape the data?

Upvotes: 13

Views: 30212

Answers (6)

atfede
atfede

Reputation: 411

Maybe not the most performant, but this function will recursively decode the encoded string until it cannot decode it anymore.

function decodeValue(str) {
    const decodedStr = decodeURIComponent(str);

    if (decodedStr === str) {
        return decodedStr; // Base case: no more decoding needed
    } else {
        return decodeValue(decodedStr); // String is encoded. Recur with the decoded value
    }
}

decodeValue("%253Ctable class='table-1'%253E%253Ctbody%253E%253Ctr%253E%253Ctd%253Esdfsd%253C/td%253E%253Ctd%253Esdfsd%253C/td%253E%253C/tr%253E%253Ctr%253E%253Ctd%253Esdfsd%253C/td%253E%253Ctd%253Esdfs%253C/td%253E%253C/tr%253E%253C/tbody%253E%253C/table%253E");

In this example the decodeValue function is called twice since the string was encoded two times.

function decodeValue(str) {
  const decodedStr = decodeURIComponent(str);

  if (decodedStr === str) {
    return decodedStr; // Base case: no more decoding needed
  } else {
    return decodeValue(decodedStr); // Recur with the decoded value
  }
}

let decodedString = decodeValue("%253Ctable class='table-1'%253E%253Ctbody%253E%253Ctr%253E%253Ctd%253Esdfsd%253C/td%253E%253Ctd%253Esdfsd%253C/td%253E%253C/tr%253E%253Ctr%253E%253Ctd%253Esdfsd%253C/td%253E%253Ctd%253Esdfs%253C/td%253E%253C/tr%253E%253C/tbody%253E%253C/table%253E");

document.write(decodedString);
table,
th,
td {
  border: 1px solid black;
}

body {
  font-size: 30px;
}

Upvotes: 0

Dudi
Dudi

Reputation: 3079

Thanks for @mika for great answer. Maybe just one improvement since unescape function is considered as deprecated:

declare function unescape(s: string): string;


decodeURItoString(str): string {

 var resp = str;

 try {
    resp = decodeURI(str);
 } catch (e) {
    console.log('ERROR: Can not decodeURI string!');

    if ( (unescape != null) && (unescape instanceof Function) ) {
        resp = unescape(str);
    }
 }

return resp;

}

Upvotes: 3

Dejan Janjušević
Dejan Janjušević

Reputation: 3230

I realize this is an old question, but I am unaware of a better solution. So I do it like this (thanks to a comment by RobertPitt above):

function isEncoded(str) {
    return typeof str == "string" && decodeURIComponent(str) !== str;
}

I have not yet encountered a case where this failed. Which doesn't mean that case doesn't exists. Maybe someone could shed some light on this.

Upvotes: 14

mika
mika

Reputation: 6972

This won't help in the server-side, but in the client-side I have used javascript exceptions to detect if the url encoding has produced ISO Latin or UTF8 encoding.

decodeURIComponent throws an exception on invalid UTF8 sequences.

try {
     result = decodeURIComponent(string);
}
catch (e) {
     result =  unescape(string);                                       
}

For example, ISO Latin encoded umlaut 'ä' %E4 will throw an exception in Firefox, but UTF8-encoded 'ä' %C3%A4 will not.

See Also

Upvotes: 16

ZZ Coder
ZZ Coder

Reputation: 75496

You don't have to differentiate them. escape() is so called percent encoding, it only differs from URI encoding in how certain chars encodes. For example, Space is encoded as %20 with escape but + with URI encoding. Once decoded, you always get the same value.

Upvotes: 0

Derek Swingley
Derek Swingley

Reputation: 8752

Encourage your clients to use encodeURIComponent(). See this page for an explanation: Comparing escape(), encodeURI(), and encodeURIComponent(). If you really want to try to figure out exactly how something was encoded, you can try to look for some of the characters that escape() and encodeURI() do not encode.

Upvotes: 8

Related Questions