Minion
Minion

Reputation: 21

Encode a String to UTF-8 in Perl

The utf8 library could not convert my data to utf-8.

#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use JSON;

my $data = qq( { "cat" : "Büster" } );
$data= utf8::encode($data);
$data= JSON::decode_json($data);
print $data->{"cat"};

OUTPUT:

malformed JSON string, neither array, object, number, string or atom, at character offset 0 (before "(end of string)")

I do not want to use Unicode::UTF8 or Encode. I want to solve this problem using utf8 library.

Upvotes: 0

Views: 12468

Answers (3)

choroba
choroba

Reputation: 241858

You need utf::encode, not decode. Both of them change the argument in place and return nothing, so there's no point in assigning the return value to the variable.

#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use JSON;

my $data = qq({"cat":"Büster"});
utf8::encode($data);
$data = JSON::decode_json($data);
binmode *STDOUT, ':encoding(UTF-8)';
print $data->{cat};

Morover, the output filehandle needs to know what encoding it should use, that's what the binmode does.

Also, make sure you save the source in the UTF-8 encoding.

Upvotes: 1

pii_ke
pii_ke

Reputation: 2891

For encoding strings to UTF-8 bytes the Encode core module can be used.

I think the following code works like you want:

#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use JSON::PP;
use Encode;

my $json = Encode::encode_utf8 q( { "cat" : "Büster" } );
my $data= JSON::PP::decode_json($json);
print Encode::encode_utf8  $data->{"cat"};

I have used the JSON::PP core module. You can replace that with JSON. They are compatible.

Upvotes: 3

ikegami
ikegami

Reputation: 385789

Two problems:

  • utf8::encode encodes in-place; it doesn't return the encoded string.
  • You need to encode the output appropriately for your terminal.
#!/usr/bin/perl
use strict;
use warnings;
use feature qw( say );

# "Str of UCP" means "string of decoded text aka string of Unicode Code Points".
# "Str of UTF-8" means "string of text encoded using UTF-8 (bytes)".

use utf8;                                # Source code encoded using UTF-8
use open ':std', ':encoding(UTF-8)';     # Terminal provides/expects UTF-8

use JSON qw( decode_json );

my $json = qq( { "cat" : "Büster" } );   # Str of UCP because of "use utf8"

utf8::encode($json);                     # Str of UCP => Str of UTF-8
my $data = decode_json($json);           # Str of UTF-8 => Hash of str of UCP

say $data->{"cat"};                      # Expects str of UCP because of "use open :std"

Alternatively, we can avoid an encoding-decoding round trip as follows:

my $json = qq( { "cat" : "Büster" } );   # Str of UCP because of "use utf8"

my $decoder = JSON->new;
my $data = $decoder->decode($json);      # Str of UCP => Hash of str of UCP

say $data->{"cat"};                      # Expects str of UCP because of "use open :std"

Upvotes: 3

Related Questions