holys
holys

Reputation: 14809

dump json into yaml

I got a .json file (named it meta.json) like this:

{
    "main": {
        "title": "今日は雨が降って",
        "description": "今日は雨が降って"
    }
}

I would like to convert it to a .yaml file (named it meta.yaml) like :

title: "今日は雨が降って"
description: "今日は雨が降って"

What I have done was :

import simplejson as json
import pyyaml

f = open('meta.json', 'r')
jsonData = json.load(f)
f.close()

ff = open('meta.yaml', 'w+')
yamlData = {'title':'', 'description':''}
yamlData['title'] = jsonData['main']['title']
yamlData['description'] = jsonData['main']['description']
yaml.dump(yamlData, ff)
# So you can  see that what I need is the value of meta.json     

But sadly, what I got is following:

{description: "\u4ECA\u65E5\u306F\u96E8\u304C\u964D\u3063\u3066", title: "\u4ECA\u65E5\
\u306F\u96E8\u304C\u964D\u3063"}

Why?

Upvotes: 37

Views: 116991

Answers (6)

scorpp
scorpp

Reputation: 668

My enhancement based on previous answers with the ability to use both files and std-in/-out for input / output.

#!/usr/bin/env python
import argparse
import sys
import json
from argparse import ArgumentParser

import yaml

parser = ArgumentParser("json2yaml", description="Convert JSON to YAML")
parser.add_argument(
    "--input",
    "-i",
    type=argparse.FileType("r"),
    default=sys.stdin,
    help="Input JSON file (stdin by default)",
)
parser.add_argument(
    "--output",
    "-o",
    type=argparse.FileType("w"),
    default=sys.stdout,
    help="Output YAML file (stdout by default)",
)
args = parser.parse_args()

yaml.dump(json.load(args.input), stream=args.output, default_flow_style=False)

Then put this script named json2yaml somewhere on your path (I use ~/.local/bin here on Fedora, your mileage may vary), chmod 755 on it and then use it both like

$ curl https://some.com/test.json | json2yaml -o test.yaml

or

$ json2yaml -i test.json | ...

or

$ json2yaml -i test.json -o test.yaml

Upvotes: 1

shoma
shoma

Reputation: 638

pyyaml.dump() has an allow_unicode option that defaults to None (all non-ASCII characters in the output are escaped). If allow_unicode=True, then it writes raw Unicode strings.

yaml.dump(data, ff, allow_unicode=True)

Bonus

You can dump JSON without encoding as follows:

json.dump(data, outfile, ensure_ascii=False)

Upvotes: 42

Saurabh Hirani
Saurabh Hirani

Reputation: 1248

This works for me:

#!/usr/bin/env python

import sys
import json
import yaml

print(yaml.dump(json.load(open(sys.argv[1])), default_flow_style=False))

So what we are doing is:

  1. load json file through json.loads
  2. json loads in unicode format - convert that to string by json.dump
  3. load the yaml through yaml.load
  4. dump the same in a file through yaml.dump - default_flow_style - True displays data inline, False doesn't do inline - so you have dumpable data ready.

Takes care of unicode as per How to get string objects instead of Unicode from JSON?

Upvotes: 20

Mitar
Mitar

Reputation: 7080

I do simply:

#!/usr/bin/env python
import sys
import json
import yaml

yaml.safe_dump(json.load(sys.stdin), sys.stdout, default_flow_style=False)

Upvotes: 5

root
root

Reputation: 80456

In [1]: import json, yaml

In [2]: with open('test.json') as js:
   ...:     data = json.load(js)[u'main']
   ...:     

In [3]: with open('test.yaml', 'w') as yml:
   ...:     yaml.dump(data, yml, allow_unicode=True)
   ...:     

In [4]: ! cat test.yaml
{!!python/unicode 'description': 今日は雨が降って, !!python/unicode 'title': 今日は雨が降って}

In [5]: with open('test.yaml', 'w') as yml:
   ...:     yaml.safe_dump(data, yml, allow_unicode=True)
   ...:     

In [6]: ! cat test.yaml
{description: 今日は雨が降って, title: 今日は雨が降って}

Upvotes: 4

DhruvPathak
DhruvPathak

Reputation: 43265

This is correct. The "\u...." strings are unicode representation of your Japanese? string. When you decode and use it with proper encoding, it should display fine wherever you use it. eg a webpage.

See the equality of data inspite of different representation as string :

>>> import json
>>> j = '{    "main": {        "title": "今日は雨が降って",        "description": "今日は雨が降って"    }}'
>>> s = json.loads(j)
>>> t = json.dumps(s)
>>> j
'{    "main": {        "title": "\xe4\xbb\x8a\xe6\x97\xa5\xe3\x81\xaf\xe9\x9b\xa8\xe3\x81\x8c\xe9\x99\x8d\xe3\x81\xa3\xe3\x81\xa6",        "description": "\xe4\xbb\x8a\xe6\x97\xa5\xe3\x81\xaf\xe9\x9b\xa8\xe3\x81\x8c\xe9\x99\x8d\xe3\x81\xa3\xe3\x81\xa6"    }}'
>>> t
'{"main": {"description": "\\u4eca\\u65e5\\u306f\\u96e8\\u304c\\u964d\\u3063\\u3066", "title": "\\u4eca\\u65e5\\u306f\\u96e8\\u304c\\u964d\\u3063\\u3066"}}'
>>> s == json.loads(t)
True

Upvotes: 2

Related Questions