korallo
korallo

Reputation: 97

Jackson objectMapper cannot read UTF-8

As in title, Jackson can't read utf-8.

Line 37:

ArrayNode arrayNode1 = objectMapper.readValue(bansFile, ArrayNode.class);

21:48:55 [SEVERE] com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0xb3 at [Source: (File); line: 18, column: 38]

Here is line 18, can't read UTF-8 "ł"

"reason" : "Administrator nie podał powodu banicji"

Whole StackTrace

21:48:55 [SEVERE]     at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1840)
21:48:55 [SEVERE]     at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:712)
21:48:55 [SEVERE]     at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidInitial(UTF8StreamJsonParser.java:3569)
21:48:55 [SEVERE]     at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidChar(UTF8StreamJsonParser.java:3565)
21:48:55 [SEVERE]     at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2511)
21:48:55 [SEVERE]     at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishAndReturnString(UTF8StreamJsonParser.java:2437)
21:48:55 [SEVERE]     at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:293)
21:48:55 [SEVERE]     at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:267)
21:48:55 [SEVERE]     at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeArray(JsonNodeDeserializer.java:437)
21:48:55 [SEVERE]     at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer$ArrayDeserializer.deserialize(JsonNodeDeserializer.java:141)
21:48:55 [SEVERE]     at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer$ArrayDeserializer.deserialize(JsonNodeDeserializer.java:126)
21:48:55 [SEVERE]     at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4202)
21:48:55 [SEVERE]     at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3070)
21:48:55 [SEVERE]     at koral.proxyban.listeners.ServerConnect.isBanned(ServerConnect.java:37)
21:48:55 [SEVERE]     at koral.proxyban.listeners.ServerConnect.onProxyConnect(ServerConnect.java:25)

Upvotes: 3

Views: 9855

Answers (2)

Federico Paparoni
Federico Paparoni

Reputation: 722

The problem isn't related to Jackson, because JSON accepted encodings are UTF8,UTF16 and UTF32.

If you write the file, you can save it using

OutputStreamWriter writer = new OutputStreamWriter(
                  new FileOutputStream("yourfile"), StandardCharsets.UTF_8);

if the file is created from other sources, you must read it with the correct encoding

BufferedReader br = new BufferedReader(new InputStreamReader(
                   new FileInputStream("yourfile"), SOME_CHARSET));

and then save the contents in UTF-8 otherwise Jackson will not accept it

Upvotes: 1

Kayaman
Kayaman

Reputation: 73548

No, the error message is saying that the data is not UTF-8.

It looks to be ISO-LATIN-2 (or equivalent) based on the fact that the offending character is ł encoded as byte 0xb3.

Your choices depend on many things. If your data is coming from an outside source you may have no say in the encoding (or you may contact the data supplier and ask them to provide data in UTF8). Then you would have to do something like

BufferedReader br = new BufferedReader(new InputStreamReader(
               new FileInputStream("yourfile"), "ISO-8859-2");    
objectMapper.readValue(br, ArrayNode.class);

In this case the InputStreamReader will correctly convert the bytes to chars, and Jackson won't have to deal with bytes at all (just text). But it also requires you to know that the file is encoded using ISO-8859-2 (i.e. Latin-2).

There are ways to guess a file's encoding, but it cannot be done safely programmatically, so you can't say "open the file in the correct encoding". The way I knew how to debug this problem was to look up common polish encodings, then see where ł is encoded with 0xb3 as in the error message.

Unfortunately there are many methods in the API that use the "default platform encoding", which is not always UTF8. So you may write a file that you think is in UTF8 because you forgot to explicitly specify that you want UTF8, such as with new OutputStreamWriter(new FileOutputStream("yourfile"), StandardCharsets.UTF_8);.

This applies to all places where bytes are converted to character and vice versa, so file access, reading text from a network socket and so on.

Upvotes: 5

Related Questions