Reputation: 4399
I am trying to solve the following exercise from "Core Java for the Impatient" by Cay Horstmann:
When an encoder of a
Charset
with partial Unicode coverage can’t encode a character, it replaces it with a default—usually, but not always, the encoding of "?". Find all replacements of all available character sets that support encoding. Use thenewEncoder
method to get an encoder, and call itsreplacement
method to get the replacement. For each unique result, report the canonical names of the charsets that use it.
For the sake of education, I have decided to tackle the exercise with gargantuan one-liner using the streaming API, even though - in my opinion - a cleaner solution would divide the calculations into a number of steps, with intermediate variables in-between (certainly it would ease the debugging). Without further ado, here is a monster of code I have created:
Charset.availableCharsets().values().stream().filter(charset -> charset.canEncode()).collect(
Collectors.groupingBy(
charset -> charset.newEncoder().replacement(),
() -> new TreeMap<>((arr1, arr2) -> Arrays.equals(arr1, arr2) == true ? 0 : Integer.compare(arr1.hashCode(), arr2.hashCode())),
Collectors.mapping( charset -> charset.name(), Collectors.toList()))).
values().stream().map(list -> list.stream().collect(Collectors.joining(", "))).forEach(System.out::println);
Basically, we take into account only the charsets that canEncode
; create a Map
with replacement
as key and a list of canonical names as values; because grouping didn't work for arrays with default implementation of groupingBy
, which uses HashMap
, I have decided to use TreeMap
. We then work with the Lists
of canonical names, join them with comma and print.
Unfortunately, I have found it to give incoherent results. If I run the function twice in the same program, the first instance returns results consisting of 23 Strings
, the second one - just 21 Strings
. I suspect it has something to do with a poor implementation of Comparator
for TreeMap
, which was defined as follows:
((arr1, arr2) -> Arrays.equals(arr1, arr2) == true ? 0 : Integer.compare(arr1.hashCode(), arr2.hashCode()))
If that is the cause, what should be a proper Comparator
in this case? Apart from that, can the one-liner be improved in any way?
I am also curious if such convoluted constructs as the code I have written are encountered in professional programs? Maybe it's only me who find it unreadable?
Upvotes: 0
Views: 93
Reputation: 298389
There is no guarantee that the hash code of two distinct instances will be different. That would be an ideal situation, but is never guaranteed. Only the opposite is true: if two objects are equal, they have the same hash code.
So if you create a comparator that considers the objects to be the same when they have the same hash code, arbitrary objects might be considered to be the same. Since the byte[]
arrays returned by replacement()
are defensive copies, read temporary objects, the result may vary in every run of this code.
Further, since the hash code of an array has nothing to do with its content, your comparator violates the transitivity rule: two arrays with equal content are supposed to be the same, but since they might/very likely have different hash codes, they have a different relation when being compared with a third array, not having the same content, a == b
, but a < c
and b > c
. This is the reason why even equal arrays, which you compare by Arrays.equals
can end up in different groups, as the TreeSet
failed to find the existing key when comparing with other keys then.
If you want the arrays to be compared by value, you can use:
Charset.availableCharsets().values().stream().filter(Charset::canEncode).collect(
Collectors.groupingBy(
charset -> charset.newEncoder().replacement(),
() -> new TreeMap<>(Comparator.comparing(ByteBuffer::wrap)),
Collectors.mapping(Charset::name, Collectors.joining(", "))))
.values().forEach(System.out::println);
ByteBuffer
s are Comparable
and consistently evaluate the contents of the wrapped array.
I moved the Collectors.joining
collector into the grouping
collector to avoid the creation of the temporary List
whose contents you are going to join afterwards anyway.
By the way, never use code like expression == true
. There is no reason to append == true
as expression
is already sufficient.
Since you are only interested in the values, in other words, don’t need the keys to be of a certain type, you may wrap all arrays beforehand, simplifying the operation and even make it slightly more efficient:
Charset.availableCharsets().values().stream().filter(Charset::canEncode).collect(
Collectors.groupingBy(
charset -> ByteBuffer.wrap(charset.newEncoder().replacement()),
TreeMap::new,
Collectors.mapping(Charset::name, Collectors.joining(", "))))
.values().forEach(System.out::println);
This change even allows resorting to hashing, if no consistent iteration order is required:
Charset.availableCharsets().values().stream().filter(Charset::canEncode).collect(
Collectors.groupingBy(
charset -> ByteBuffer.wrap(charset.newEncoder().replacement()),
Collectors.mapping(Charset::name, Collectors.joining(", "))))
.values().forEach(System.out::println);
This works, because ByteBuffer
also implements equals
and hashCode
.
Upvotes: 2