Reputation: 2183

How to prevent ObjectMapper from converting escaped unicode?

I'm using Jackson 2.4 in Java to do some JSON legwork. I make a call to a remote server with Apache HttpGet, deserialize the results with Jackson into a POJO, manipulate those results, and then serialize them with Jackson to push back to a remote server with HttpPost.

The issue I'm finding is that Jackson is translating unicode literals into unicode characters, which I need it not to do thanks to encoding issues on each end. For example, I might have this in the JSON:

"field1": "\u00a2"

But Jackson is converting the "\u00a2" to "¢" when it's deserialized, which causes problems with the remote server. It has to be maintained as escaped unicode. If I use something like Apache EntityUtils (specifying UTF-8) or even make the call from my web browser to get the data, the escaped unicode is preserved, so I know that it's coming in properly from the server. If I have Jackson consume the input stream from the entity on the response, it does the conversion automatically.

I've tried writing with a JsonGenerator that is explicitly set to UTF-8 to write to the HttpPost. It didn't work, remote server still rejected it. I've dug through the configuration options for ObjectMapper and JsonParser, but I don't see anything that would override this behavior. Escaping non-ASCII, sure, but that's not what I need to do here. Maybe I'm missing something obvious, but I can't get Jackson to deserialize this string without replacing the escaped unicode.

EDIT: Well, my bad, the only literals having problems have 3 or 5 leading slashes, not just one. That's some screwiness, but Java seems to be what's unpacking it by default during the deserialization, even if the raw text that came back from the server preserves it. Still not sure how to get Java to preserve this without checking an insane amount of text.

Upvotes: 6

Answers (3)

eitama

Reputation: 1507

Spent a few hours looking for a solution, and I found it.

I had some Binary data. for example 0xab 0xa6 0xaa I wanted my json to look like this:

{
  "binary-data-as-unicode": "\u00ab\u00a6\u00aa"
}

The reader of this json would get rid of the \u00 and treat what's left as a hex string representing binary data.

In order to use jackson ObjectMapper, I prepared a String object with my formatted unicode notation string of hex data:

public static String toHexString(ByteArrayOutputStream stream) {
    byte[] byteArray = stream.toByteArray();
    StringBuilder hexString = new StringBuilder();
    for (byte b : byteArray) {
        hexString.append(String.format("\\u%04X", b & 0xFF));
    }
    return String.format("\"%s\"", hexString.toString());
}

Then, I used:

// stream of type ByteArrayOutputStream
String unicodedString = toHexString(stream);
ObjectNode objNode = mapper.createObjectNode();
objNode.putRawValue("field-name", new RawValue(unicodedString));

This way, there was no further escaping, and I got what I wanted.

Upvotes: 0

tianzhipeng

Reputation: 2209

Another way to custom Jackson's behavior is customized JsonParser. See jackson's source code of JsonFactory, ReaderBasedJsonParser;

The key methond is _finishString2() which is used to do 'decodeEscaped', so we can write a JsonParser extends ReaderBasedJsonParser and override the _finishString2 method:

public class MyJsonParser extends ReaderBasedJsonParser {
    @Override
    protected void _finishString2() throws IOException {
        char[] outBuf = _textBuffer.getCurrentSegment();
        int outPtr = _textBuffer.getCurrentSegmentSize();
        final int[] codes = _icLatin1;
        final int maxCode = codes.length;

        while (true) {
            if (_inputPtr >= _inputEnd) {
                if (!loadMore()) {
                    _reportInvalidEOF(": was expecting closing quote for a string value");
                }
            }
            char c = _inputBuffer[_inputPtr++];
            int i = (int) c;
            if (i < maxCode && codes[i] != 0) {
                if (i == INT_QUOTE) {
                    break;
                } else {
                    //c = _decodeEscaped();
                    //do nth
                }
            }
            // Need more room?
            if (outPtr >= outBuf.length) {
                outBuf = _textBuffer.finishCurrentSegment();
                outPtr = 0;
            }
            // Ok, let's add char to output:
            outBuf[outPtr++] = c;
        }
        _textBuffer.setCurrentLength(outPtr);
    }

    public static void main(String[] args) throws IOException {
        String json = "{\"field1\": \"\\u00a2\",\"field2\": \"\\u00a2 this\",\"numberField\": 121212}";
        ObjectMapper objectMapper = new ObjectMapper(new MyJsonParserFactory());
        Object o = objectMapper.readValue(json, Object.class);
        System.out.println(o);
    }
}

Full demo code here

Upvotes: 0

Optional

Reputation: 4517

What you are expecting is outside scope of Jackosn. It's java that converts the String while reading it. For same reason, if you have a properties file with value \u00a2 and read it using jdk API, you will get converted value. Depending on the file size, either you can double escape char \ before passing the string to Json or "escape" the string back using your Deserializer (only for string) and something like below:

Thank you

package com.test.json;

import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.DeserializationContext;
import com.fasterxml.jackson.databind.JsonDeserializer;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.module.SimpleModule;
import java.io.IOException;
import java.util.Map;

public class Jackson {

    static ObjectMapper _MAPPER = new ObjectMapper();

    public static void main(String[] args) throws Exception {
        String json = "{\"field1\": \"\\u00a2\",\"field2\": \"\\u00a2 this\",\"numberField\": 121212}";
        SimpleModule testModule
                = new SimpleModule("StOvFl", _MAPPER.version()).addDeserializer(String.class,
                        new UnEscapedSerializaer());

        _MAPPER.registerModule(testModule);

        Map m = _MAPPER.readValue(json, new TypeReference<Map<String, Object>>() {
        });
        System.out.println("m" + m);

    }
}

class UnEscapedSerializaer extends JsonDeserializer<String> {

    @Override
    public String deserialize(JsonParser jp, DeserializationContext ctxt)
            throws IOException, JsonProcessingException {
        String s = jp.getValueAsString();
        return org.apache.commons.lang.StringEscapeUtils.StringEscapeUtils.escapeJava(s);

    }
}

Upvotes: 1

How to prevent ObjectMapper from converting escaped unicode?

Answers (3)

Related Questions