Reputation: 21
Noob here. I have a large number of json files, each is a series of blog posts in a different language. The key-value pairs are meta data about the posts, e.g. "{'author':'John Smith', 'translator':'Jane Doe'}. What I want to do is convert it to a python dictionary, then extract the values so that I have a list of all the authors and translators across all the posts.
for lang in languages:
f = 'posts-' + lang + '.json'
file = codecs.open(f, 'rt', 'utf-8')
line = string.strip(file.next())
postAuthor[lang] = []
postTranslator[lang]=[]
while (line):
data = json.loads(line)
print data['author']
print data['translator']
When I tried this method, I keep getting a key error for translator and I'm not sure why. I've never worked with the json module before so I tried a more complex method to see what happened:
postAuthor[lang].append(data['author'])
for translator in data.keys():
if not data.has_key('translator'):
postTranslator[lang] = ""
postTranslator[lang] = data['translator']
It keeps returning an error that strings do not have an append function. This seems like a simple task and I'm not sure what I'm doing wrong.
Upvotes: 2
Views: 7346
Reputation: 71
See if this works for you:
import json
# you have lots of "posts", so let's assume
# you've stored them in some list. We'll use
# the example text you gave as one of the entries
# in said list
posts = ["{'author':'John Smith', 'translator':'Jane Doe'}"]
# strictly speaking, the single-quotes in your example isn't
# valid json, so you'll want to switch the single-quotes
# out to double-quotes, you can verify this with something
# like http://jsonlint.com/
# luckily, you can easily swap out all the quotes programmatically
# so let's loop through the posts, and store the authors and translators
# in two lists
authors = []
translators = []
for post in posts:
double_quotes_post = post.replace("'", '"')
json_data = json.loads(double_quotes_post)
author = json_data.get('author', None)
translator = json_data.get('translator', None)
if author: authors.append(author)
if translator: translators.append(translator)
# and there you have it, a list of authors and translators
Upvotes: 2