Nico Villanueva
Nico Villanueva

Reputation: 894

JSON encoding in Perl output

Context:
I have to migrate a Perl script, into Python. The problem resides in that the configuration files that this Perl script uses, is actually valid Perl code. My Python version of it, uses .yaml files as config.

Therefore, I basically had to write a converter between Perl and yaml. Given that, from what I found, Perl does not play well with Yaml, but there are libs that allow dumping Perl hashes into JSON, and that Python works with JSON -almost- natively, I used this format as an intermediate: Perl -> JSON -> Yaml. The first conversion is done in Perl code, and the second one, in Python code (which also does some mangling on the data).

Using the library mentioned by @simbabque, I can output YAML natively, which afterwards I must modify and play with. As I know next to nothing of Perl, I prefer to do so in Python.

Problem:
The source config files look something like this:

$sites = {
    "0100101001" => {
        mail => 1,
        from => '[email protected]',
        to => '[email protected]',
        subject => 'á é í ó ú',
        msg => 'á é í ó ú',
        ftp => 0,
        sftp => 0,
    },
    "22222222" => {
[...]

And many more of those.

My "parsing" code is the following:

use strict;
use warnings;

# use JSON;
use YAML;
use utf8;
use Encode;
use Getopt::Long;

my $conf;
GetOptions('conf=s' => \$conf) or die;
our (
    $sites
);
do $conf;

# my $json = encode_json($sites);
my $yaml = Dump($sites);

binmode(STDOUT, ':encoding(utf8)');
# print($json);
print($yaml);

Nothing out of the ordinary. I simply need the JSON YAML version of the Perl data. In fact, it mostly works. My problem is with the encoding.

The output of the above code is this:

  [...snip...]
  mail: 1
  msg: á é í ó ú
  sftp: 0
  subject: á é í ó ú
  [...snip...]

The encoding goes to hell and back. As far as I read, UTF-8 is the default, and just in case, I force it with binmode, but to no avail.

What am I missing here? Any workaround?

Note: I thought I may have been my shell, but locale outputs this:

❯ locale
LANG=
LC_COLLATE="C"
LC_CTYPE="UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

Which seems ok.

Note 2: I know next to nothing of Perl, and is not my intent to be an expert on it, so any enhancements/tips are greatly appreciated too.

Note 3: I read this answer, and my code is loosely based on it. The main difference is that I'm not sure how to encode a file, instead of a simple string.

Upvotes: 0

Views: 811

Answers (1)

mob
mob

Reputation: 118605

The sites config file is UTF-8 encoded. Here are three workarounds:

  1. Put use utf8 pragma inside the site configuration file. The use utf8 pragma in the main script is not sufficient to treat files included with do/require as UTF-8 encoded.

  2. If that is not feasible, decode the input before you pass it to the JSON encoder. Something like

    open CFG, "<:encoding(utf-8)", $conf;
    do { local $/; eval <CFG> };
    close CFG;
    

instead of

do $conf
  1. Use JSON::to_json instead of JSON::encode_json. encode_json expects decoded input (Unicode code points) and the output is UTF-8 encoded. The output of to_json is not encoded, or rather, it will have the same encoding as the input, which is what you want.

There is no need to encode the final output as UTF-8. Using any of the three workarounds will already produce UTF-8 encoded output.

Upvotes: 4

Related Questions