Reputation: 1990
I have a chance to work with different locales and been encountering issues with string validation lately. There are many cases that the strings I got from the translation team are somehow different than what we receive from the website, even though they look identical.
For example:
import org.apache.commons.codec.binary.StringUtils;
@Test
public void test() {
String actual = "pouze v angličtině.";
String expected = "pouze v angličtině.";
byte[] bytesactual = StringUtils.getBytesUtf8(actual);
byte[] bytesexpected = StringUtils.getBytesUtf8(expected);
System.out.println(Arrays.toString(bytesactual));
System.out.println(Arrays.toString(bytesexpected));
Assert.assertEquals(bytesactual, bytesexpected);
}
The test fails with the result:
[112, 111, 117, 122, 101, 32, 118, -62, -96, 97, 110, 103, 108, 105, -60, -115, 116, 105, 110, -60, -101, 46]
[112, 111, 117, 122, 101, 32, 118, 32, 97, 110, 103, 108, 105, -60, -115, 116, 105, 110, -60, -101, 46]
The test never passes since there are a few different characters after the "v". The front end person reads the same json as I do for the tests, but somehow the characters got malformed. Currently, my workaround is to open the json and copy/paste what I got from the website and replace it, but since this happens many and many times so i'm looking for a programmatic solution to this issue instead of replacing the json strings. Any ideas? Thank you.
Upvotes: 1
Views: 181
Reputation: 15
I would say that is most likely due to the character encoding, I would double check the character encoding on the front end is as expected and if you go ahead to programmatically do this you can then enforce your own encoding set of rules.
As a general rule, anything outside the 0 - 127 (ascii) range is introduced by character sets that wish to add characters beyong the Ascii standard table.
Upvotes: 1