Vince Varga
Vince Varga

Reputation: 6918

How to reverse strings that contain surrogate pairs in Dart?

I was playing with algorithms using Dart and as I actually followed TDD, I realized that my code has some limitations.

I was trying to reverse strings as part of an interview problem, but I couldn't get the surrogate pairs correctly reversed.

const simple = 'abc';
const emoji = '๐ŸŽ๐Ÿ๐Ÿ›';
const surrogate = '๐Ÿ‘ฎ๐Ÿฝโ€โ™‚๏ธ๐Ÿ‘ฉ๐Ÿฟโ€๐Ÿ’ป';

String rev(String s) {
    return String.fromCharCodes(s.runes.toList().reversed);
}

void main() {
    print(simple);
    print(rev(simple));
    print(emoji);
    print(rev(emoji));
    print(surrogate);
    print(rev(surrogate));
}

The output:

abc
cba
๐ŸŽ๐Ÿ๐Ÿ›
๐Ÿ›๐Ÿ๐ŸŽ
๐Ÿ‘ฎ๐Ÿฝโ€โ™‚๏ธ๐Ÿ‘ฉ๐Ÿฟโ€๐Ÿ’ป
๐Ÿ’ปโ€๐Ÿฟ๐Ÿ‘ฉ๏ธโ™‚โ€๐Ÿฝ๐Ÿ‘ฎ

You can see that the simple emojis are correctly reversed as I'm using the runes instead of just simply executing s.split('').toList().reversed.join(''); but the surrogate pairs are reversed incorrectly.

How can I reverse a string that might contain surrogate pairs using the Dart programming language?

Upvotes: 2

Views: 1132

Answers (3)

user8234870
user8234870

Reputation:

Create an extension on String named reversed

extension on String {
  /// Reverse the string

  String get reversed =>
      GraphemeSplitter().splitGraphemes(this).toList().reversed.join();
}

In order to add GraphemeSplitter class install grapheme_splitter package :

dart pub add grapheme_splitter

Example Program:

import "package:grapheme_splitter/grapheme_splitter.dart";
import "dart:io";

void main(final List<String> $) async {
  test();
}

void test() async {
  final Writer writer = Writer();

  const simple = 'abc';

  const emoji = '๐ŸŽ๐Ÿ๐Ÿ›';

  const surrogate = '๐Ÿ‘ฎ๐Ÿฝโ€โ™‚๏ธ๐Ÿ‘ฉ๐Ÿฟโ€๐Ÿ’ป';

  const hell = "Zอ‘อซอƒอชฬ‚อซฬฝอฬดฬ™ฬคฬžอ‰อšฬฏฬžฬ อAอซอ—ฬดอขฬตฬœฬฐอ”Lอจองอฉอ˜ฬ Gฬ‘อ—ฬŽฬ…อ›อฬดฬปอˆออ”ฬนOอ‚ฬŒฬŒอ˜ฬจฬตฬนฬปฬฬณ";

  const nightmare = "๐ŸŒท๐ŸŽ๐Ÿ’ฉ๐Ÿ˜œ๐Ÿ‘๐Ÿณ๏ธโ€๐ŸŒˆ";

  await writer.print(simple);
  await writer.print(emoji);
  await writer.print(surrogate);
  await writer.print(hell);
  await writer.print(nightmare);

  await writer.print(simple.reversed);
  await writer.print(emoji.reversed);
  await writer.print(surrogate.reversed);
  await writer.print(hell.reversed);
  await writer.print(nightmare.reversed);
}

class Writer {
  final String filePath;
  final File file;

  Writer({this.filePath = "./data.dat"})
      : file = File(filePath)
          ..writeAsString(
              ""); // If File exits lets truncate it

  print(final Object data) async {
    await file.writeAsString("${data.toString()}\n",
        mode: FileMode.append); // Appends to the above file
  }
}

extension on String {
  /// Reverse the string

  String get reversed =>
      GraphemeSplitter().splitGraphemes(this).toList().reversed.join();
}

Output in data.dat file

abc
๐ŸŽ๐Ÿ๐Ÿ›
๐Ÿ‘ฎ๐Ÿฝโ€โ™‚๏ธ๐Ÿ‘ฉ๐Ÿฟโ€๐Ÿ’ป
Zอ‘อซอƒอชฬ‚อซฬฝอฬดฬ™ฬคฬžอ‰อšฬฏฬžฬ อAอซอ—ฬดอขฬตฬœฬฐอ”Lอจองอฉอ˜ฬ Gฬ‘อ—ฬŽฬ…อ›อฬดฬปอˆออ”ฬนOอ‚ฬŒฬŒอ˜ฬจฬตฬนฬปฬฬณ
๐ŸŒท๐ŸŽ๐Ÿ’ฉ๐Ÿ˜œ๐Ÿ‘๐Ÿณ๏ธโ€๐ŸŒˆ
cba
๐Ÿ›๐Ÿ๐ŸŽ
๐Ÿ‘ฉ๐Ÿฟโ€๐Ÿ’ป๐Ÿ‘ฎ๐Ÿฝโ€โ™‚๏ธ
Oอ‚ฬŒฬŒอ˜ฬจฬตฬนฬปฬฬณGฬ‘อ—ฬŽฬ…อ›อฬดฬปอˆออ”ฬนLอจองอฉอ˜ฬ Aอซอ—ฬดอขฬตฬœฬฐอ”Zอ‘อซอƒอชฬ‚อซฬฝอฬดฬ™ฬคฬžอ‰อšฬฏฬžฬ อ
๐Ÿณ๏ธโ€๐ŸŒˆ๐Ÿ‘๐Ÿ˜œ๐Ÿ’ฉ๐ŸŽ๐ŸŒท

The output is displayed in a file instead of terminal is because most of the terminal will not render these characters properly.

Upvotes: 0

Vince Varga
Vince Varga

Reputation: 6918

Dart 2.7 introduced a new package that supports grapheme cluster-aware operations. The package is called characters. characters is a package for characters represented as Unicode extended grapheme clusters.

Dartโ€™s standard String class uses the UTF-16 encoding. This is a common choice in programming languages, especially those that offer support for running both natively on devices, and on the web.

UTF-16 strings usually work well, and the encoding is transparent to the developer. However, when manipulating strings, and especially when manipulating strings entered by users, you may experience a difference between what the user perceives as a character, and what is encoded as a code unit in UTF-16.

Source: "Announcing Dart 2.7: A safer, more expressive Dart" by Michael Thomsen, section "Safe substring handling"

The package will also help to reverse your strings with emojis the way a native programmer would expect.

Using simple Strings, you find issues:

String hi = 'Hi ๐Ÿ‡ฉ๐Ÿ‡ฐ';
print('String.length: ${hi.length}');
// Prints 7; would expect 4

With characters

String hi = 'Hi ๐Ÿ‡ฉ๐Ÿ‡ฐ';
print(hi.characters.length);
// Prints 4
print(hi.characters.last);
// Prints ๐Ÿ‡ฉ๐Ÿ‡ฐ

It's worth taking a look at the source code of the characters package, it's far from simple but looks easier to digest and better documented than grapheme_splitter. The characters package is also maintained by the Dart team.

Upvotes: 0

daxim
daxim

Reputation: 39158

When reversing strings, you must operate on graphemes, not characters nor code units. Use grapheme_splitter.

Upvotes: 2

Related Questions