Reputation: 2355
I have to count the characters in a given String. I save the counts to a map Map<Character, Long>
. The code does not work with some special symbols like "two hearts". When I convert such a special symbol into a character, then I get the compiler error "Too many characters in character literal" or similar. Why does this happen and how to fix it ?
Here is some rough code to demonstrate the problem. This is not the full code.
import java.util.HashMap;
import java.util.Map;
public class Demo {
public static void main(String[]args){
String twoHeartsStr = "💕";
Map<Character, Long> output = new HashMap<>();
output.put(twoHeartsStr.charAt(0), 1L);
//Compiler error:
//intellij IDE compiler : Too many characters in character literal.
//java: unclosed character literal.
Map<Character, Long> expectedOutput = Map.of('💕', 1L);
System.out.println("Maps are equal : " + output.equals(expectedOutput));
}
}
EDIT : Updated solution after getting answers to this question.
import java.util.HashMap;
import java.util.Map;
public class Demo {
public static void main(String[]args){
String twoHeartsStr = "💕";//Try #, alphabet, number etc.
Map<String, Long> output = new HashMap<>();
int codePoint = twoHeartsStr.codePointAt(0);
String charValue = String.valueOf(Character.toChars(codePoint));//Size = 2 for twoHearts.
output.put(charValue, 1L);
Map<String, Long> expectedOutput = Map.of("💕", 1L);
System.out.println("Maps are equal : " + output.equals(expectedOutput));//true.
}
}
Upvotes: 0
Views: 1642
Reputation: 51083
By Java's definition, "💕"
is not one character; it is two:
>>> "💕".length()
2 (int)
So '💕'
is a syntax error, because char
is a 16-bit integer type, and the Unicode symbol 💕 is not represented by just one 16-bit integer value.
The solution to your problem is to use strings instead.
Upvotes: 3
Reputation: 61607
The code does not work with some special symbols like "two hearts"... Why does this happen
The Java char
type is a 16-bit value. In the early days of Unicode, this was sufficient to store all the code-point values, but that quickly changed. The established Unicode specification allows for over a million characters, some of which need to be represented with a "surrogate pair".
From the documentation:
A String represents a string in the UTF-16 format in which supplementary characters are represented by surrogate pairs (see the section Unicode Character Representations in the Character class for more information). Index values refer to char code units, so a supplementary character uses two positions in a String.
Moving on:
twoHeartsStr.charAt(0)
This will give you the first half of the surrogate pair, which is not a valid character on its own despite being a valid char
value (char
is fundamentally an integer type rather than a textual type).
...and how to fix it ?
You can use 32-bit integers (i.e., int
or Integer
) to represent the values, and the codePointAt
method to extract them from the string. Note, however, that when you iterate over the string, you'll still need to skip over the indices corresponding to the second halves of the pairs.
You still won't be able to store the "supplementary characters" in a char
, so you won't be able to write them in char literals. So to look up the two-hearts character in the resulting histogram (or to populate your reference data for testing), you'll want to get the integer code-point value from a string with that symbol.
Upvotes: 3