GainesvilleJesus
GainesvilleJesus

Reputation: 27

Using scraped JSON data containing Unicode in Python

I scraped some JSON data into a file called 'wotd-page-one.json' using Scrapy. The JSON data contains some Spanish words and the accented letters were converted to Unicode. I'd like to load this data and make it usbale with a python script within the same directory. I am trying to load this data into a list to work each JSON key and value individually. However, I am having a hard time making this happen since I have not had a ton of experience using Unicode and JSON. Could anyone please help me find a way to make these data accessible via a Python list. Ideally, Id like to make it something like data[2] == "DEF" data[3] == "string with any unicode characters converted to latin-1" and data[4] == "SENTENCE" data[5] == "string with any unicode characters converted to latin-1"

   Python file:

   data=[]
   with open('wotd-page-one.json', encoding='utf-8') as f:
   for line in f:
       line = line.replace('\n', '')
       data.append(line)
   print(data)


    JSON file:
 [
{"TRANSLATION": "I don't like how that guy's whistling; it gives me the creeps.", "WORD": "silbar", "DEF": "to whistle", "SENTENCE": "No me gusta c\u00f3mo silba ese se\u00f1or; me da escalofr\u00edos."},
{"TRANSLATION": "\"Is somebody there?\" asked the boy in a startled voice.", "WORD": "sobresaltado", "DEF": "startled", "SENTENCE": "\"\u00bfHay alguien aqu\u00ed?\" pregunt\u00f3 el ni\u00f1o con voz sobresaltada."},
{"TRANSLATION": "Carla made a face at me when I asked her if she was scared.", "WORD": "la mueca", "DEF": "face", "SENTENCE": "Carla me hizo una mueca cuando le pregunt\u00e9 si ten\u00eda miedo."},
{"TRANSLATION": "The teacher tapped the board with the chalk.", "WORD": "golpetear", "DEF": "to tap", "SENTENCE": "El maestro golpete\u00f3 el pizarr\u00f3n con la tiza."}
   ]

Output:
 ['[', 
'{"TRANSLATION": "I don\'t like how that guy\'s whistling; it gives me the creeps.", "WORD": "silbar", "DEF": "to whistle", "SENTENCE": "No me gusta c\\u00f3mo silba ese se\\u00f1or; me da escalofr\\u00edos."},', '
{"TRANSLATION": "\\"Is somebody there?\\" asked the boy in a startled voice.", "WORD": "sobresaltado", "DEF": "startled", "SENTENCE": "\\"\\u00bfHay alguien aqu\\u00ed?\\" pregunt\\u00f3 el ni\\u00f1o con voz sobresaltada."},', '
{"TRANSLATION": "Carla made a face at me when I asked her if she was scared.", "WORD": "la mueca", "DEF": "face", "SENTENCE": "Carla me hizo una mueca cuando le pregunt\\u00e9 si ten\\u00eda miedo."},', '
{"TRANSLATION": "The teacher tapped the board with the chalk.", "WORD": "golpetear", "DEF": "to tap", "SENTENCE": "El maestro golpete\\u00f3 el pizarr\\u00f3n con la tiza."}', ']']

Upvotes: 0

Views: 661

Answers (2)

user6357139
user6357139

Reputation:

The first line of the json file is read "[", then it is you an attempt is made to parse it however an exception is raised because this is not valid json format. By reading line by line, you're disregarding the rest of the file, so you shouldn't do this. Instead just use json.load like so:

with open("wotd-page-one.json") as f:
    data = json.load(f)

Upvotes: 0

Mark Tolonen
Mark Tolonen

Reputation: 177891

With a JSON file, you can load it in one operation. It will be turned into a Python structure...in this case, a list of dictionaries. For example:

import json

with open('wotd-page-one.json') as f:
    data = json.load(f)

for d in data:
    print(d['SENTENCE'])

Output:

No me gusta cómo silba ese señor; me da escalofríos.
"¿Hay alguien aquí?" preguntó el niño con voz sobresaltada.
Carla me hizo una mueca cuando le pregunté si tenía miedo.
El maestro golpeteó el pizarrón con la tiza.

Upvotes: 1

Related Questions