Reputation: 1
I am trying to standardize a set of data. Some of the names were UTF-8 encoded, others were not. What I need to do in JAVA is detect if the name was UTF encoded or not using some form of conditional logic so I can translate each row correctly.
String s1 = "José Flores";
String s1 = "José Flores";
IF [condition] (identify UTF-8)
byte[] utf8Bytes = s1.getBytes("ISO-8859-1");
String s2 = new String(utf8Bytes,"UTF-8");
ELSE
String s2 = s1;
Upvotes: 0
Views: 413
Reputation: 1163
With the help of juniversalchardet , you can get the encoding , then do the condition operation. This could help you get encoding type.
public static String guessEncoding(byte[] bytes) {
String DEFAULT_ENCODING = "UTF-8";
org.mozilla.universalchardet.UniversalDetector detector =
new org.mozilla.universalchardet.UniversalDetector(null);
detector.handleData(bytes, 0, bytes.length);
detector.dataEnd();
String encoding = detector.getDetectedCharset();
detector.reset();
if (encoding == null) {
encoding = DEFAULT_ENCODING;
}
return encoding;
}
This require juniversalchardet-1.0.3.jar, Also here are some info
Upvotes: 1