Reputation: 14809
I got a .json
file (named it meta.json
) like this:
{
"main": {
"title": "今日は雨が降って",
"description": "今日は雨が降って"
}
}
I would like to convert it to a .yaml
file (named it meta.yaml
) like :
title: "今日は雨が降って"
description: "今日は雨が降って"
What I have done was :
import simplejson as json
import pyyaml
f = open('meta.json', 'r')
jsonData = json.load(f)
f.close()
ff = open('meta.yaml', 'w+')
yamlData = {'title':'', 'description':''}
yamlData['title'] = jsonData['main']['title']
yamlData['description'] = jsonData['main']['description']
yaml.dump(yamlData, ff)
# So you can see that what I need is the value of meta.json
But sadly, what I got is following:
{description: "\u4ECA\u65E5\u306F\u96E8\u304C\u964D\u3063\u3066", title: "\u4ECA\u65E5\
\u306F\u96E8\u304C\u964D\u3063"}
Why?
Upvotes: 37
Views: 116991
Reputation: 668
My enhancement based on previous answers with the ability to use both files and std-in/-out for input / output.
#!/usr/bin/env python
import argparse
import sys
import json
from argparse import ArgumentParser
import yaml
parser = ArgumentParser("json2yaml", description="Convert JSON to YAML")
parser.add_argument(
"--input",
"-i",
type=argparse.FileType("r"),
default=sys.stdin,
help="Input JSON file (stdin by default)",
)
parser.add_argument(
"--output",
"-o",
type=argparse.FileType("w"),
default=sys.stdout,
help="Output YAML file (stdout by default)",
)
args = parser.parse_args()
yaml.dump(json.load(args.input), stream=args.output, default_flow_style=False)
Then put this script named json2yaml
somewhere on your path (I use ~/.local/bin
here on Fedora, your mileage may vary), chmod 755
on it and then use it both like
$ curl https://some.com/test.json | json2yaml -o test.yaml
or
$ json2yaml -i test.json | ...
or
$ json2yaml -i test.json -o test.yaml
Upvotes: 1
Reputation: 638
pyyaml.dump()
has an allow_unicode
option that defaults to None
(all non-ASCII characters in the output are escaped). If allow_unicode=True
, then it writes raw Unicode strings.
yaml.dump(data, ff, allow_unicode=True)
You can dump JSON without encoding as follows:
json.dump(data, outfile, ensure_ascii=False)
Upvotes: 42
Reputation: 1248
This works for me:
#!/usr/bin/env python
import sys
import json
import yaml
print(yaml.dump(json.load(open(sys.argv[1])), default_flow_style=False))
So what we are doing is:
Takes care of unicode as per How to get string objects instead of Unicode from JSON?
Upvotes: 20
Reputation: 7080
I do simply:
#!/usr/bin/env python
import sys
import json
import yaml
yaml.safe_dump(json.load(sys.stdin), sys.stdout, default_flow_style=False)
Upvotes: 5
Reputation: 80456
In [1]: import json, yaml
In [2]: with open('test.json') as js:
...: data = json.load(js)[u'main']
...:
In [3]: with open('test.yaml', 'w') as yml:
...: yaml.dump(data, yml, allow_unicode=True)
...:
In [4]: ! cat test.yaml
{!!python/unicode 'description': 今日は雨が降って, !!python/unicode 'title': 今日は雨が降って}
In [5]: with open('test.yaml', 'w') as yml:
...: yaml.safe_dump(data, yml, allow_unicode=True)
...:
In [6]: ! cat test.yaml
{description: 今日は雨が降って, title: 今日は雨が降って}
Upvotes: 4
Reputation: 43265
This is correct. The "\u...." strings are unicode representation of your Japanese? string. When you decode and use it with proper encoding, it should display fine wherever you use it. eg a webpage.
See the equality of data inspite of different representation as string :
>>> import json
>>> j = '{ "main": { "title": "今日は雨が降って", "description": "今日は雨が降って" }}'
>>> s = json.loads(j)
>>> t = json.dumps(s)
>>> j
'{ "main": { "title": "\xe4\xbb\x8a\xe6\x97\xa5\xe3\x81\xaf\xe9\x9b\xa8\xe3\x81\x8c\xe9\x99\x8d\xe3\x81\xa3\xe3\x81\xa6", "description": "\xe4\xbb\x8a\xe6\x97\xa5\xe3\x81\xaf\xe9\x9b\xa8\xe3\x81\x8c\xe9\x99\x8d\xe3\x81\xa3\xe3\x81\xa6" }}'
>>> t
'{"main": {"description": "\\u4eca\\u65e5\\u306f\\u96e8\\u304c\\u964d\\u3063\\u3066", "title": "\\u4eca\\u65e5\\u306f\\u96e8\\u304c\\u964d\\u3063\\u3066"}}'
>>> s == json.loads(t)
True
Upvotes: 2