con
con

Reputation: 6093

perl & python writing out non-ASCII characters into JSON differently

I have hash keys that look like this:

1я310яHOM_REF_truth:HOM_ALT_test:discordant_hom_ref_to_hom_altяAяC

this is a string that is joined by the Cyrillic letter я which I chose as a delimiter because it will never appear in this files.

I write this to a JSON file in Perl 5.30.2 thus:

use JSON 'encode_json';

sub hash_to_json_file {
    my $hash     = shift;
    my $filename = shift;
    my $json = encode_json $hash;
    open my $out, '>', $filename;
    say $out $json
}

and in python 3.8:

use json
def hash_to_json_file(hashtable,filename):
    json1=json.dumps(hashtable)
    f = open(filename,"w+")
    print(json1,file=f)
    f.close()

when I try to load a JSON written by Python back into a Perl script, I see a cryptic error that I don't know how to solve:

Wide character in say at read_json.pl line 27.

Reading https://perldoc.perl.org/perlunifaq.html I've tried adding use utf8 to my script, but it doesn't work. I've also tried '>:encoding(UTF-8)' within my subroutine, but the same error results.

Upon inspection of the JSON files, I see keys like "1Ñ180ÑHET_ALT_truth:HET_REF_test:discordant_het_alt_to_het_refÑAÑC,G" where ÑAÑ substitutes я. In the JSON written by python, I see \u044f I think that this is the wide character, but I don't know how to change it back.

I've also tried changing my subroutine:

use Encode 'decode';
sub json_file_to_hash {
   my $file = shift;
   open my $in, '<:encoding(UTF-8)', $file;
   my $json = <$in>;
   my $ref = decode_json $json;
   $ref = decode('UTF-8', $json);
   return %{ $ref }
}

but this gives another error:

Wide character in hash dereference at read_json.pl line 17, <$_[...]> line 1

How can I get python JSON read into Perl correctly?

Upvotes: 1

Views: 349

Answers (2)

ikegami
ikegami

Reputation: 385887

use utf8;                               # Source is encoded using UTF-8
use open ':std', ':encoding(UTF-8)';    # For say to STDOUT.  Also default for open()

use JSON qw( decode_json encode_json );

sub hash_to_json_file {
    my $qfn = shift;
    my $ref = shift;
    my $json = encode_json($ref);       # Produces UTF-8
    open(my $fh, '>:raw', $qfn)         # Write it unmangled
       or die("Can't create \"$qfn\": $!\n");

    say $fh $json;
}

sub json_file_to_hash {
    my $qfn = shift;
    open(my $fh, '<:raw', $qfn)         # Read it unmangled
       or die("Can't create \"$qfn\": $!\n");

    local $/;                           # Read whole file
    my $json = <$fh>;                   # This is UTF-8
    my $ref = decode_json($json);       # This produces decoded text
    return $ref;                        # Return the ref rather than the keys and values.
}

my $src = { key => "1я310яHOM_REF_truth:HOM_ALT_test:discordant_hom_ref_to_hom_altяAяC" };
hash_to_json("a.json", $src);
my $dst = hash_to_json("a.json");
say $dst->{key};

You could also avoid using :raw by using from_json and to_json.

use utf8;                               # Source is encoded using UTF-8
use open ':std', ':encoding(UTF-8)';    # For say to STDOUT. Also default for open()

use JSON qw( from_json to_json );

sub hash_to_json_file {
    my $qfn  = shift;
    my $hash = shift;
    my $json = to_json($hash);          # Produces decoded text.
    open(my $fh, '>', $qfn)             # "use open" will add :encoding(UTF-8)
       or die("Can't create \"$qfn\": $!\n");

    say $fh $json;                      # Encoded by :encoding(UTF-8)
}

sub json_file_to_hash {
    my $qfn = shift;
    open(my $fh, '<', $qfn)             # "use open" will add :encoding(UTF-8)
       or die("Can't create \"$qfn\": $!\n");

    local $/;                           # Read whole file
    my $json = <$fh>;                   # Decoded text thanks to "use open".
    my $ref = from_json($json);         # $ref contains decoded text.
    return $ref;                        # Return the ref rather than the keys and values.
}

my $src = { key => "1я310яHOM_REF_truth:HOM_ALT_test:discordant_hom_ref_to_hom_altяAяC" };
hash_to_json("a.json", $src);
my $dst = hash_to_json("a.json");
say $dst->{key};

Upvotes: 2

mob
mob

Reputation: 118605

I like the ascii option so that the JSON output is all 7-bit ASCII

my $json = JSON->new->ascii->encode($hash);

Both the Perl and Python JSON modules will be able to read it.

Upvotes: 0

Related Questions