How to decode unwell-formed hex string of emoji character like "`1f1e81f1f3`"?

Question

Suppose there's a hex string of emoji character like "1f1e81f1f3", it's unwell-formed hex string of code point of an emoji character, and it's supposed to be two string like 1f1e8 1f1f3

I'm using org.apache.commons.codec.binary.Hex to decode hex string, but obviously Hex need the length of input string be even, so I need to make the hex string in zero padding style like "01f1e801f1f3".

Currently, I simply replace "1f" with "01f", so far so good, but since an emoji glyph may contains a sequence of unicode characters, so

Is it safe to simply replace "1f" with "01f" ?
If it's not safe, how to decode such hex string safely/properly and restore/translate them to correct emoji character/character_sequence? It seems I need to implement a custom UTF16BE decoder?

Background

This hex string of emoji character is stripped from "" string, it's a text message retrieved from a popular IM software via unofficial HTTP API.

LiuYan 刘研 · Accepted Answer

I ends up with writing a small function to restore emoji characters.

Basic procedure:

Make a pointer to the start of the hex string.
Search from the the pointer position of the hex string,
- If it's starts with "1f", then pad three zeroes before "1f", store it to a new hex string, then pointer step to next 5th position. Otherwise, no zero padding is made, store the sub string to a new hex string, and pointer step to the next 4th position.
- Decode the new hex string to byte array.
- Create new String using UTF_32BE or UTF_16BE character encoding from the byte array.
Loop to step 2, until end of the hex string.

It works, but it's not perfect, it could introduce bug if

One character of emoji character sequence is located in supplementary character
And
It's hex string does not starts with "1f", or the length of it's hex string is not 5.

Code snippet:

import java.util.*;
import java.util.regex.*;

import org.apache.commons.codec.*;
import org.apache.commons.codec.binary.Hex;
import org.apache.commons.lang3.*;

public static final Charset UTF_32BE = Charset.forName ("UTF-32BE");
public static final String REGEXP_FindTransformedEmojiHexString = "";
public static final Pattern PATTERN_FindTransformedEmojiHexString = Pattern.compile (REGEXP_FindTransformedEmojiHexString, Pattern.CASE_INSENSITIVE);
public static String RestoreEmojiCharacters (String sContent)
{
        bMatched = true;
        String sEmojiHexString = matcher.group(1);

        Hex hex = new Hex (StandardCharsets.ISO_8859_1);
        try
        {
            for (int i=0; i

How to decode unwell-formed hex string of emoji character like "`1f1e81f1f3`"?

Background

Answers (1)

Related Questions

How to decode unwell-formed hex string of emoji character like &quot;`1f1e81f1f3`&quot;?

Background

Answers (1)

Related Questions

How to decode unwell-formed hex string of emoji character like "`1f1e81f1f3`"?