HannahCarney
HannahCarney

Reputation: 3631

get correct index of character in string by counting emojis only as one character

The problem on my android app is that when I count the length of a string with emojis in it, each emoji counts as two or more characters. I'm working on the android version of an app that has an iOS version. iOS counts each emoji as one character - when the index gets returned from the iOS app it assumes each emoji is counted as one.

"Hi i love 👻 @team"

I would like to get the index of @team, when the only information I have is the index given by iOS which is 13, on android this maybe 14 or even 15.

Upvotes: 4

Views: 3019

Answers (3)

jimbob
jimbob

Reputation: 3298

My answer to this after trying seeing new emojis getting released was to use a fairly well maintained library:

I imported this library:

implementation 'com.vdurmont:emoji-java:4.0.0'

Then I created a utility method to get the length of a string counting emojis as 1:

fun getLengthWithEmoji(s: String): Int{
        var emojiCount = EmojiParser.extractEmojis(s).size;
        var noEmojiString = EmojiParser.removeAllEmojis(s);
        var emojiAndStringCount = emojiCount + noEmojiString.length;
        return emojiAndStringCount;
}

Generally to 'Get emoji count in string' I would use this line:

var emojiCount = EmojiParser.extractEmojis(s).size;

This accounts for all the latest emojis (depending on how up to date your library it). Check for some of the forks that others have made on the library as they in some cases have added missing emoji patterns.

Upvotes: 2

Joop Eggen
Joop Eggen

Reputation: 109593

This answer proposes to use java's Unicode support for code points.

An emoji symbol (grapheme) is a Unicode code point. Java internally also uses Unicode, but normally as (UTF-16) char a two-byte code, and an emoji has a code point with a Unicode number much higher. Hence java uses several chars. For a start with emojis.

But one can use code points in java. Java 8 has some extra help; not needed but I hope Android is already up to to some functionality.

Taking the length in code points:

int codePointsLength(String s) {
    int n = 0;
    for (int i = 0; i < s.length(); ) {
        int codePoint = s.codePointAt(i);
        i += Character.charCount(codePoint);
        ++n;
    }
    return n;
}

int codePointsLength(String s) {
    return (int) s.codePoints().count(); // Java 8.
}

Making a string from an emoji, using the Unicode code point:

final int RAISED_EYEBROW = 0x1f928; // U+1F928.
String s = new String(new int[] {RAISED_EYEBROW}, 0, 1);

Finding the position of the string indexed by code point:

int codePointIndexOf(String s, int codePoint) {
    int n = 0;
    for (int i = 0; i < s.length(); ) {
        int cp = s.codePointAt(i);
        if (cp == codePoint) {
            return n;
        }
        i += Character.charCount(cp);
        ++n;
    }
    return -1;
}

// Java 9 takeWhile.
int codePointIndexOf(String s, int codePoint) {
    int totalCount = (int) s.codePoints().count();
    int count = (int) s.codePoints().takeWhile(cp -> cp != codePoint).count();
    return count >= totalCount ? -1 : count;
}

Upvotes: 4

HannahCarney
HannahCarney

Reputation: 3631

Thought I should post my answer since I've had two upvotes.

Decided it was best to go with iOS indexes as "real" and android's indexes as "fake", which led me having to convert everything to iOS indexes. graphemeGetIndex get's the iOS "real" indexes from the Java "fake" ones, and grahemeGetLength gets the "real" length incase you need it.

Ask if you have any questions

public static int graphemeLength(String s) {
        BreakIterator it = BreakIterator.getCharacterInstance();
        it.setText(s);
        int count = 0;
        while (it.next() != BreakIterator.DONE) {
            count++;
        }
        return count;
    }

public static int graphemeGetIndex(String wholeString, int mIndex) {
    BreakIterator it = BreakIterator.getCharacterInstance();
    int realStartIndex = 0;
    if (mIndex >= 0) {
        String partString = wholeString.substring(0, mIndex);
        it.setText(partString);
        while (it.next() != BreakIterator.DONE) {
            realStartIndex++;
        }
    }
    return realStartIndex;
}

private void recalculateIndices() {
        for (final UserMention mention : mMentions) {
            final int startFake = mCurrentText.indexOf("@" + mention.getName());
            final int startReal = graphemeGetIndex(mCurrentText, startFake);
            mention.setRealIndices(new int[]{startReal, startReal + graphemeLength(mention.getName())});
            mention.setJavaFakeIndices(new int[]{startFake, startFake + mention.getName().length()});
        }
    }

Upvotes: 2

Related Questions