Reputation: 69
HEX(Base16) Encoded bytes get decoded with Base64 without throwing exception? How to distinguish whether it was encoded with base16 encoder only?
org.apache.commons.codec.binary.Base64.decodeBase64(bytesencodedwithHex);
When bytes to above method is a hex encoded data the method dose not throw any exception or help to identify it was Hex encoded. Even org.apache.commons.codec.binary.Base64.Base64.isBase64(bytesencodedwithHex) return true.
Example Below String "Hello" got encoded with Hex and when I decode with Base64 it gives some nonsense.How could I let me client know that they are using wrong decoder in this case? :
System.out.println(new
String(org.bouncycastle.util.encoders.Hex.encode("Hello".getBytes())));
System.out.println(new String(org.bouncycastle.util.encoders.Base64.decode("48656c6c6f".getBytes())));
Upvotes: 0
Views: 578
Reputation: 109567
There are strings that are either base 64 or base 16, without any clue.
But there are clues:
/
and +
, and G-Zg-z
are missing.So:
boolean probablyHex(String s) {
if (s.endsWith("=")) { // Base64 padding char (optional).
return false;
}
s = s.replaceAll("[^-_+/A-Za-z0-9]", ""); // MIME safe Base64 variant too.
if (s.matches(".*[-_+/G-Zg-z].*")) {
return false;
}
int n = s.length();
if (n % 2 == 1) {
return false;
}
if (n % 3 == 1) { // Spurious char with 6 bits data.
return true;
}
// Very unlikely that it is Base64, but you might have a bias towards Base64:
if (!s.equals(s.toUpperCase(Locale.US)) && !s.equals(s.toLowerCase(Locale.US)) {
// Mixed cases in A-Fa-f:
// For small texts that is significantly incoherent, meaning Base64.
return n > 32;
}
return true;
}
Upvotes: 1
Reputation: 1207
Every hexadecimal string is a legitimate Base64 string.
Hex encoding gives you a string that represents the original's string bytes, and comprised of 0-9 and A-F. Base64 encoding gives you a string that encodes the original string, and comprised from only printable characters (which, of course, include 0-9,A-F).
So each string made of 0-9,A-F can represent a hexadecimal string, but also a Base64 string (that happens to have only 0-9,A-F).
You will need a different way to tell the user the encoding that was used. An example is to send a structure of encoding type together with the string, or send the original's string's length (so if after the decoding you get a wrong length- this was not the right encoding mode).
Upvotes: 2