Reputation: 1086
I'm trying to parse an utf-8 json file in Perl. https://jsonlint.com/ says the json is valid. Still I get the error message:
malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "\x{ef}\x{bb}\x{bf}{"...") at parse.pl line 15.
The code is:
use strict;
use utf8;
use JSON qw( );
my $filename = 'k2.json';
my $json_text = do {
open(my $json_fh, $filename) or die("Can't open $filename: $!\n");
local $/;
<$json_fh>
};
my $json = JSON->new;
my $data = $json->decode($json_text);
for ( @{$data->{data}} ) {
print $_->{lng}."\n";
}
The utf-8 coded json is:
{"data":
[{"lng":"19.03252602",
"lat":"47.49795914",
"display_name":"I. kerület (Attila út)",
"active":"1",
"url":"/hu/kormanyablakok/budapest/i-kerulet/i-kerulet-attila-ut/283"
}]
}
I see that (ef, bb, bf) are the three bytes that indicate that it's an utf-8 document, so I don't understand what JSON package is missing here. How can I make it work?
Specifying "<:encoding(UTF-8)" at opening the file did not help either...
Upvotes: 0
Views: 1255
Reputation: 385867
use strict;
use warnings qw( all );
use utf8;
use open ':std', ':encoding(UTF-8)';
use feature qw( say );
use JSON qw( );
my $filename = 'k2.json';
my $json_text = do {
open(my $json_fh, '<', $filename)
or die("Can't open $filename: $!\n");
local $/;
<$json_fh>
};
$json_text =~ s/^\N{BOM}//;
my $data = JSON->new->decode($json_text);
say $_->{lng} for @{ $data->{data} };
or
use strict;
use warnings qw( all );
use utf8;
use open ':std', ':encoding(UTF-8)';
use feature qw( say );
use File::BOM qw( open_bom );
use JSON qw( );
my $filename = 'k2.json';
my $json_text = do {
open_bom(my $fh, $file, ':encoding(UTF-8)')
or die("Can't open $filename: $!\n");
local $/;
<$json_fh>
};
my $data = JSON->new->decode($json_text);
say $_->{lng} for @{ $data->{data} };
Notes:
use open ':std', ':encoding(UTF-8)';
causes printing STDOUT to encode using UTF-8. This will be required to print the display_name
in your example.
It also sets the default encoding that's used to decode the JSON file in the first snippet.
I left in use utf8;
, but it doesn't do anything since the source code is entirely ASCII.
Upvotes: 1
Reputation: 118605
JSON
does not expect input to have the byte order mark. Strip it before you run the JSON decoder.
$json_text =~ s/^[^\x00-\x7f]+//;
my $data = $json->decode($json_text);
The byte-order mark was not pasted to JSONlint, so JSONlint was not evaluating the same document that you have.
Upvotes: 2