Reputation: 6918
I was playing with algorithms using Dart and as I actually followed TDD, I realized that my code has some limitations.
I was trying to reverse strings as part of an interview problem, but I couldn't get the surrogate pairs correctly reversed.
const simple = 'abc';
const emoji = '๐๐๐';
const surrogate = '๐ฎ๐ฝโโ๏ธ๐ฉ๐ฟโ๐ป';
String rev(String s) {
return String.fromCharCodes(s.runes.toList().reversed);
}
void main() {
print(simple);
print(rev(simple));
print(emoji);
print(rev(emoji));
print(surrogate);
print(rev(surrogate));
}
The output:
abc
cba
๐๐๐
๐๐๐
๐ฎ๐ฝโโ๏ธ๐ฉ๐ฟโ๐ป
๐ปโ๐ฟ๐ฉ๏ธโโ๐ฝ๐ฎ
You can see that the simple emojis are correctly reversed as I'm using the runes
instead of just simply executing s.split('').toList().reversed.join('');
but the surrogate pairs are reversed incorrectly.
How can I reverse a string that might contain surrogate pairs using the Dart programming language?
Upvotes: 2
Views: 1132
Reputation:
Create an extension
on String
named reversed
extension on String {
/// Reverse the string
String get reversed =>
GraphemeSplitter().splitGraphemes(this).toList().reversed.join();
}
In order to add GraphemeSplitter
class install grapheme_splitter
package :
dart pub add grapheme_splitter
import "package:grapheme_splitter/grapheme_splitter.dart";
import "dart:io";
void main(final List<String> $) async {
test();
}
void test() async {
final Writer writer = Writer();
const simple = 'abc';
const emoji = '๐๐๐';
const surrogate = '๐ฎ๐ฝโโ๏ธ๐ฉ๐ฟโ๐ป';
const hell = "Zออซออชฬอซฬฝอฬดฬฬคฬออฬฏฬฬ อAอซอฬดอขฬตฬฬฐอLอจองอฉอฬ Gฬอฬฬ
ออฬดฬปอออฬนOอฬฬอฬจฬตฬนฬปฬฬณ";
const nightmare = "๐ท๐๐ฉ๐๐๐ณ๏ธโ๐";
await writer.print(simple);
await writer.print(emoji);
await writer.print(surrogate);
await writer.print(hell);
await writer.print(nightmare);
await writer.print(simple.reversed);
await writer.print(emoji.reversed);
await writer.print(surrogate.reversed);
await writer.print(hell.reversed);
await writer.print(nightmare.reversed);
}
class Writer {
final String filePath;
final File file;
Writer({this.filePath = "./data.dat"})
: file = File(filePath)
..writeAsString(
""); // If File exits lets truncate it
print(final Object data) async {
await file.writeAsString("${data.toString()}\n",
mode: FileMode.append); // Appends to the above file
}
}
extension on String {
/// Reverse the string
String get reversed =>
GraphemeSplitter().splitGraphemes(this).toList().reversed.join();
}
abc
๐๐๐
๐ฎ๐ฝโโ๏ธ๐ฉ๐ฟโ๐ป
Zออซออชฬอซฬฝอฬดฬฬคฬออฬฏฬฬ อAอซอฬดอขฬตฬฬฐอLอจองอฉอฬ Gฬอฬฬ
ออฬดฬปอออฬนOอฬฬอฬจฬตฬนฬปฬฬณ
๐ท๐๐ฉ๐๐๐ณ๏ธโ๐
cba
๐๐๐
๐ฉ๐ฟโ๐ป๐ฎ๐ฝโโ๏ธ
OอฬฬอฬจฬตฬนฬปฬฬณGฬอฬฬ
ออฬดฬปอออฬนLอจองอฉอฬ AอซอฬดอขฬตฬฬฐอZออซออชฬอซฬฝอฬดฬฬคฬออฬฏฬฬ อ
๐ณ๏ธโ๐๐๐๐ฉ๐๐ท
The output is displayed in a file instead of terminal
is because most of the terminal
will not render these characters properly.
Upvotes: 0
Reputation: 6918
Dart 2.7 introduced a new package that supports grapheme cluster-aware operations. The package is called characters
. characters
is a package for characters represented as Unicode extended grapheme clusters.
Dartโs standard String class uses the UTF-16 encoding. This is a common choice in programming languages, especially those that offer support for running both natively on devices, and on the web.
UTF-16 strings usually work well, and the encoding is transparent to the developer. However, when manipulating strings, and especially when manipulating strings entered by users, you may experience a difference between what the user perceives as a character, and what is encoded as a code unit in UTF-16.
Source: "Announcing Dart 2.7: A safer, more expressive Dart" by Michael Thomsen, section "Safe substring handling"
The package will also help to reverse your strings with emojis the way a native programmer would expect.
Using simple String
s, you find issues:
String hi = 'Hi ๐ฉ๐ฐ';
print('String.length: ${hi.length}');
// Prints 7; would expect 4
With characters
String hi = 'Hi ๐ฉ๐ฐ';
print(hi.characters.length);
// Prints 4
print(hi.characters.last);
// Prints ๐ฉ๐ฐ
It's worth taking a look at the source code of the characters
package, it's far from simple but looks easier to digest and better documented than grapheme_splitter
. The characters
package is also maintained by the Dart team.
Upvotes: 0
Reputation: 39158
When reversing strings, you must operate on graphemes, not characters nor code units. Use grapheme_splitter.
Upvotes: 2