lino arango
lino arango

Reputation: 142

Emoji and accent encoding in dart/flutter

I get the next String from my api

"à é í ó ú ü ñ \uD83D\uDE00\uD83D\uDE03\uD83D\uDE04\uD83D\uDE01\uD83D\uDE06\uD83D\uDE05"

from a response in a json format

{
    'apiText': "à é í ó ú ü ñ \uD83D\uDE00\uD83D\uDE03\uD83D\uDE04\uD83D\uDE01\uD83D\uDE06\uD83D\uDE05",
    'otherInfo': 'etc.',
    .
    .
    .
}

it contains accents à é í ó ú ü ñ that are not correctly encoded and it contains emojis \uD83D\uDE00\uD83D\uDE03\uD83D\uDE04\uD83D\uDE01\uD83D\uDE06\uD83D\uDE05

so far i have tried

var json = jsonDecode(response.body)
String apiText = json['apiText'];
List<int> bytes = apiText.codeUnits;
comentario = utf8.decode(bytes);

but produces a

[ERROR:flutter/lib/ui/ui_dart_state.cc(166)] Unhandled Exception: FormatException: Invalid UTF-8 byte (at offset 21)

how can i get the correct text with accents and emoji?

Upvotes: 6

Views: 6036

Answers (2)

Mashood .H
Mashood .H

Reputation: 1782

rewrite your function as

String utf8convert(String text) {
  var bytes = text.codeUnits;

  String decodedCode = utf8.decode(bytes, allowMalformed: true);
  if (decodedCode.contains("�")) {
    return text;
  }
  return decodedCode;
}

Upvotes: 0

julemand101
julemand101

Reputation: 31219

Based on the fact you called response.body I assumes you are using the http package which does have the body property on Response objects.

You should note the following detail in the documentation:

This is converted from bodyBytes using the charset parameter of the Content-Type header field, if available. If it's unavailable or if the encoding name is unknown, latin1 is used by default, as per RFC 2616.

Well, it seems rather likely that it cannot figure out the charset and therefore defaults to latin1 which explains how your response got messed up.

A solution for this is to use the resonse.bodyBytes instead which contains the raw bytes from the response. You can then manually parse this with e.g. utf8.decode(resonse.bodyBytes) if you are sure the response should be parsed as UTF-8.

Upvotes: 14

Related Questions