What is a better way to compare localized strings that look identical but not

Question

I have a chance to work with different locales and been encountering issues with string validation lately. There are many cases that the strings I got from the translation team are somehow different than what we receive from the website, even though they look identical.

For example:

import org.apache.commons.codec.binary.StringUtils;

    @Test
    public void test() {
        String actual = "pouze v angličtině.";
        String expected = "pouze v angličtině.";

        byte[] bytesactual = StringUtils.getBytesUtf8(actual);
        byte[] bytesexpected = StringUtils.getBytesUtf8(expected);

        System.out.println(Arrays.toString(bytesactual));
        System.out.println(Arrays.toString(bytesexpected));

        Assert.assertEquals(bytesactual, bytesexpected);        
    }

The test fails with the result:

[112, 111, 117, 122, 101, 32, 118, -62, -96, 97, 110, 103, 108, 105, -60, -115, 116, 105, 110, -60, -101, 46]
[112, 111, 117, 122, 101, 32, 118, 32, 97, 110, 103, 108, 105, -60, -115, 116, 105, 110, -60, -101, 46]

The test never passes since there are a few different characters after the "v". The front end person reads the same json as I do for the tests, but somehow the characters got malformed. Currently, my workaround is to open the json and copy/paste what I got from the website and replace it, but since this happens many and many times so i'm looking for a programmatic solution to this issue instead of replacing the json strings. Any ideas? Thank you.

What is a better way to compare localized strings that look identical but not

Answers (1)

Related Questions