Reputation: 3026
I am trying to create a JSON via first making a python dict that ultimately produces the following structured format:
{"sentences": [{"sentence": "At the end of November 2005 , Hong Kong and America had 132 licensed banks , 41 restricted licensed banks , 35 deposit-taking institutions , and 86 representative offices .","parsedSentence": "xyz in text e.g. At the end of November 2005 , LOCATION_SLOT and LOCATION_SLOT had NUMBER_SLOT licensed banks , NUMBER_SLOT restricted licensed banks , NUMBER_SLOT deposit-taking institutions , and NUMBER_SLOT representative offices .","location-value-pairs": [{"America": 132}, {"America": 41}, {"America": 35},
{"Hong Kong": 132}, {"Hong Kong": 41}, {"Hong Kong": 35}]}]}
However I can't seem to create this code of 2 nested keys, and then a third key of keys, each of the keys having an array.
My current code structure is the following (note, I couldn't get the keys like "sentence", "parsedSentence" etc to be created). Note I have no key variables (my keys are the strings themselves) which I want to move out of so that in future I can traverse this python dictionary quicker:
for sentence in parsedSentences:
wordsInSentence = []
for token in sentence["tokens"]:
wordsInSentence.append(token["word"])
sentence = " ".join(wordsInSentence)
for locationTokenIDs, location in tokenIDs2location.items():
for numberTokenIDs, number in tokenIDs2number.items():
if sentence not in sentences2location2values:
sentences2location2values[sentence] = {}
if location not in sentences2location2values[sentence]:
sentences2location2values[sentence][location] = []
sentences2location2values[sentence][location].append(number)
with open(outputFile, "wb") as out:
json.dump(sentences2location2values, out)
This gives me a JSON looking like this:
{"Mobutu Sese Seku seized power in 1965 via a coup , renaming the country Zaire , and reigning for the next 32 years as head of a ruthless and corrupt dictatorship .": {"Zaire": [32.0]}, "\u00c3 cents \u00c2 $ \u00c2 cents Movement for the Liberation of the Congo -LRB- MLC -RRB- : Under the direction of Bemba , and backed by Uganda , the MLC was formed in 1998 with 154 soldiers .": {"Congo": [154.0], "Uganda": [154.0]}, ...
Which doesn't get me to the structure I need.
How can I have a solution that essentially allows me to fill in the right keys and values one by one at the right parts of the loop, and is not just a one line solution?
Upvotes: 0
Views: 175
Reputation: 3250
It seems like there's somewhat of a mismatch between the ideal output at the beginning of your question, and what the code actually does, in that the code doesn't create the keys sentence
, parsedSentence
and location-value-pairs
.
This may just mean I've misunderstood the question, but if not, you could try something like:
output = {"sentences": []}
for sentence in parsedSentences:
sentenceDict = {"parsedSentence": sentence}
wordsInSentence = []
for token in sentence["tokens"]:
wordsInSentence.append(token["word"])
sentence = " ".join(wordsInSentence)
sentenceDict["sentence"] = sentence
sentenceDict["location-value-pairs"] = []
for locationTokenIDs, location in tokenIDs2location.items():
for numberTokenIDs, number in tokenIDs2number.items():
sentenceDict["location-value-pairs"].append({location: number})
output["sentences"].append(sentenceDict)
Upvotes: 1