Rajaghaneshan
Rajaghaneshan

Reputation: 41

How to convert unicode with hex to String in dart / flutter

%u0BB5%u0BA3%u0B95%u0BCD%u0B95%u0BAE%u0BCD

Above is unicode with hex characters string Need to convert that to readable text When decoded, the above text will return வணக்கம் meaning welcome

Upvotes: 3

Views: 5992

Answers (1)

jamesdlin
jamesdlin

Reputation: 89975

If you want a hard-coded string, as noted in Special characters in Flutter and in the Dart Language Tour, you can use \u to specify Unicode code points:

var welcome = '\u0BB5\u0BA3\u0B95\u0BCD\u0B95\u0BAE\u0BCD';

If you are given a string '%u0BB5%u0BA3%u0B95%u0BCD%u0B95%u0BAE%u0BCD' and need to convert it dynamically at runtime, then you will need to:

  1. Split the string into %uXXXX components.
  2. Parse the XXXX portion as a hexadecimal integer to get the code point.
  3. Construct a String from the code points.
void main() {
  var s = '%u0BB5%u0BA3%u0B95%u0BCD%u0B95%u0BAE%u0BCD';
  var re = RegExp(r'%u(?<codePoint>[0-9A-Fa-f]{4})');
  var matches = re.allMatches(s);
  var codePoints = [
    for (var match in matches)
      int.parse(match.namedGroup('codePoint')!, radix: 16),
  ];
  var decoded = String.fromCharCodes(codePoints);
  print(decoded); // Prints: வணக்கம்
}

Edit 1

An adjusted version that can handle strings with a mixture of encoded code points and unencoded characters:

void main() {
  var s = '%u0BB5%u0BA3%u0B95%u0BCD%u0B95%u0BAE%u0BCD'
      ' hello world! '
      '%u0BB5%u0BA3%u0B95%u0BCD%u0B95%u0BAE%u0BCD';
  var re = RegExp(r'(%u(?<codePoint>[0-9A-Fa-f]{4}))|.');
  var matches = re.allMatches(s);
  var codePoints = <int>[];
  for (var match in matches) {
    var codePoint = match.namedGroup('codePoint');
    if (codePoint != null) {
      codePoints.add(int.parse(codePoint, radix: 16));
    } else {
      codePoints += match.group(0)!.runes.toList();
    }
  }
  var decoded = String.fromCharCodes(codePoints);
  print(decoded); // Prints: வணக்கம் hello world! வணக்கம்
}

Edit 2

The versions above assumed that your input would consist only of Unicode code points encoded as %uHHHH (where H is a hexadecimal digit) and of raw ASCII characters. However, your new version of this question indicates that you actually need to handle a mixture of:

  • Unicode code points encoded as %uHHHH.
  • Raw (unencoded) ASCII characters.
  • ASCII characters encoded as a %HH.

To handle that third case:

void main() {
  var s = '%3Cp%3E%3Cb%3E%u0B87%u0BA8%u0BCD%u0BA4%u0BBF%u0BAF%u0BBE%u0BB5%u0BBF%u0BA9%u0BCD%20%u0BAA%u0BC6%u0BB0%u0BC1%u0BAE%u0BCD%u0BAA%u0BBE%u0BA9%u0BCD%u0BAE%u0BC8%u0BAF%u0BBE%u0BA9%20%u0BAE%u0B95%u0BCD%u0B95%u0BB3%u0BCD%20%u0BAA%u0BB4%u0B99%u0BCD%u0B95%u0BBE%u0BB2%u0BA4%u0BCD%u0BA4%u0BBF%u0BB2%u0BBF%u0BB0%u0BC1%u0BA8%u0BCD%u0BA4%u0BC7%20.........%20%u0BAA%u0BCB%u0BA9%u0BCD%u0BB1%u0BC1%20%u0BA4%u0BBE%u0BA9%u0BBF%u0BAF%u0B99%u0BCD%u0B95%u0BB3%u0BC8%20%u0BAE%u0BC1%u0B95%u0BCD%u0B95%u0BBF%u0BAF%20%u0B89%u0BA3%u0BB5%u0BBE%u0B95%u0BAA%u0BCD%20%u0BAA%u0BAF%u0BA9%u0BCD%u0BAA%u0B9F%u0BC1%u0BA4%u0BCD%u0BA4%u0BBF%u0BA9%u0BB0%u0BCD.%3C/b%3E%0A%3Col%20type%3D%22I%22%20style%3D%22font-weight%3Abold%3B%22%3E%0A%3Cli%3E%3Cspan%20style%3D%22font-weight%3Anormal%3B%22%3E%20%u0B85%u0BB0%u0BBF%u0B9A%u0BBF%3C/span%3E%3C/li%3E%0A%3Cli%3E%3Cspan%20style%3D%22font-weight%3Anormal%3B%22%3E%20%u0B95%u0BC7%u0BB4%u0BCD%u0BB5%u0BB0%u0B95%u0BC1%20%3C/span%3E%3C/li%3E%0A%3Cli%3E%3Cspan%20style%3D%22font-weight%3Anormal%3B%22%3E%20%u0B93%u0B9F%u0BCD%u0BB8%u0BCD%3C/span%3E%3C/li%3E%0A%3Cli%3E%3Cspan%20style%3D%22font-weight%3Anormal%3B%22%3E%20%u0BAA%u0BB0%u0BC1%u0BAA%u0BCD%u0BAA%u0BC1%3C/span%3E%3C/li%3E%3C/ol%3E%3C/p%3E';
  var re = RegExp(
    r'(%(?<asciiValue>[0-9A-Fa-f]{2}))'
    r'|(%u(?<codePoint>[0-9A-Fa-f]{4}))'
    r'|.',
  );
  var matches = re.allMatches(s);
  var codePoints = <int>[];
  for (var match in matches) {
    var codePoint = match.namedGroup('asciiValue') ?? match.namedGroup('codePoint');
    if (codePoint != null) {
      codePoints.add(int.parse(codePoint, radix: 16));
    } else {
      codePoints += match.group(0)!.runes.toList();
    }
  }
  var decoded = String.fromCharCodes(codePoints);
  print(decoded);
}

which prints:

<p><b>இந்தியாவின் பெரும்பான்மையான மக்கள் பழங்காலத்திலிருந்தே ......... போன்று தானியங்களை முக்கிய உணவாகப் பயன்படுத்தினர்.</b>
<ol type="I" style="font-weight:bold;">
<li><span style="font-weight:normal;"> அரிசி</span></li>
<li><span style="font-weight:normal;"> கேழ்வரகு </span></li>
<li><span style="font-weight:normal;"> ஓட்ஸ்</span></li>
<li><span style="font-weight:normal;"> பருப்பு</span></li></ol></p>

There are packages that can render HTML (e.g. package:flutter_html and probably various others). Otherwise I'm going to consider dealing with the HTML to be outside the scope of this answer, and that would deserve its own question anyway.

Upvotes: 2

Related Questions