Reputation: 99
I am attempting to open a text file pulled from hdfs, extract certain values, and output this file into a single row csv file. Below is the 'content' of the text file and the code I am using to extract the data and output:
#file.txt
{"timestamp": someInt, "videoId": someString, "overridden": someInt, "scores": [{"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}]}
{"timestamp": someInt, "videoId": someString, "overridden": someInt, "scores": [{"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}, {"bucket": someString, "name": someString, "value": someInt}]}
...
The initial code:
wanted_data = []
with open('file.txt', 'r') as f:
for line in f:
json_data = json.loads(line)
wanted_data.append(json_data['videoId'])
for i in range(6):
wanted_data.append(json_data['scores'][i]['bucket'])
wanted_data.append(json_data['scores'][i]['value'])
with open('file.csv', 'w+') as f_out:
write = csv.writer(f_out)
write.writerow(wanted_data)
Which results in a JSONDecode Error:
/usr/lib/python3.7/json/decoder.py in raw_decode(self, s, idx)
353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end
JSONDecodeError: Expecting value: line 2 column 1 (char 1)
What is the proper way I should be loading this text file?
Upvotes: 0
Views: 53
Reputation: 14233
It looks like you have empty lines between JSON strings. check that line actually has some text before processing it:
wanted_data = []
with open('file.txt', 'r') as f:
for line in f:
if line.strip():
json_data = json.loads(line)
wanted_data.append(json_data['videoId'])
for score in json_data['scores']:
wanted_data.append(score['bucket'])
wanted_data.append(score['value'])
Upvotes: 1