Reputation: 19
Previously, I read from a CSV file and got the min, max and average of my data in the CSV file. I'm trying to read the same data from a JSON file, and write output to CSV, but I'm not understanding how to do it. Any help is greatly appreciated. My JSON file is as follows:
{
"data": [
{
"time": "2015-10-14 15:01:10",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 19,
"d4": 6.21,
"d4": 105.99,
"d5": 42,
"d6": 59.24
}
},
{
"time": "2015-10-14 15:01:20",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 1,
"d4": 0.81,
"d5": 121.57,
"d6": 42,
"d7": 59.24
} .. ..
The code that I've so far is:
df = pd.read_json('data.json', convert_dates = True) df['time'] = [pd.to_datetime(d) for d in df['time']] df = df.set_index('time') hourly_stats = d.groupby(pd.TimeGrouper('H')) print((hourly_stats).agg([np.mean, np.min, np.max])) ((hourly_stats).agg([np.mean, np.min, np.max])).to_csv('file.csv')
Upvotes: 1
Views: 561
Reputation: 884
Well, your actual code and your description of what you're trying to do seem a bit different. Hopefully this will help a bit, all you need to do is redefine the headers and stick your business logic in the "json_to_dict" function and you should be good to go.
import json
import csv
def to_csv(json_obj, fname='my_csv.csv'):
with open(fname, 'w') as f:
to_write = json_to_writable_dict(json_obj)
fieldnames = ['time'] + ['d{}'.format(i) for i in range(1, 8)]
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for row in to_write:
writer.writerow(row)
return fname
def json_to_writable_dict(json_obj):
data, values, time = 'data', 'values', 'time'
json_dict = dict(json_obj)
to_write = []
for item in json_dict[data]:
row = {'d{}'.format(i): item[values]['d{}'.format(i)] for i in range(1, 8)}
row.update({'time': item[time]})
to_write.append(row)
return to_write
def main():
s = '''{
"data": [
{
"time": "2015-10-14 15:01:10",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 19,
"d4": 6.21,
"d5": 105.99,
"d6": 42,
"d7": 59.24
}
},
{
"time": "2015-10-14 15:01:20",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 1,
"d4": 0.81,
"d5": 121.57,
"d6": 42,
"d7": 59.24
}
}
]
}'''
json_thing = json.loads(s)
csv_name = to_csv(json_obj=json_thing)
with open(csv_name) as f:
print(f.read())
if __name__ == '__main__':
main()
Upvotes: 0
Reputation: 3663
First of all, your JSON is incorrect. Correct it, and Validate it before use. After that, you can do something like this to get the data in python:
import json
fp =open('/path/tp/my/file')
mystr = fp.read()
fp.close()
data = json.loads(mystr)
Upvotes: 2
Reputation: 6383
I slightly modified your JSON string and added one more record to have diferent 'Hour' groups.
import pandas as pd
import numpy as np
import json
jsondata = '''{
"data": [
{
"time": "2015-10-14 15:01:10",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 19,
"d4": 6.21,
"d5": 105.99,
"d6": 42,
"d7": 59.24
}
},
{
"time": "2015-10-14 15:01:20",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 1,
"d4": 0.81,
"d5": 121.57,
"d6": 42,
"d7": 59.24
}
},
{
"time": "2015-10-14 16:01:20",
"values": {
"d1": 31956.58,
"d2": 0,
"d3": 1,
"d4": 0.81,
"d5": 121.57,
"d6": 42,
"d7": 59.24
}
}
]
}
'''
data = json.loads(jsondata)['data']
#If your JSON data is in a file, then do:
#data = json.load(jsonfile)['data']
df = pd.DataFrame(data=[record['values'] for record in data],
index=pd.DatetimeIndex([record['time'] for record in data], name='time'))
print df
print df.groupby(pd.Grouper(freq='H')).agg([np.mean, max, min])
Output(df
):
d1 d2 d3 d4 d5 d6 d7
time
2015-10-14 15:01:10 3956.58 0 19 6.21 105.99 42 59.24
2015-10-14 15:01:20 3956.58 0 1 0.81 121.57 42 59.24
2015-10-14 16:01:20 31956.58 0 1 0.81 121.57 42 59.24
Output statistics:
d1 d2 d3 \
mean max min mean max min mean max min
time
2015-10-14 15:00:00 3956.58 3956.58 3956.58 0 0 0 10 19 1
2015-10-14 16:00:00 31956.58 31956.58 31956.58 0 0 0 1 1 1
d4 ... d5 d6 \
mean ... min mean max min mean max min
time ...
2015-10-14 15:00:00 3.51 ... 0.81 113.78 121.57 105.99 42 42 42
2015-10-14 16:00:00 0.81 ... 0.81 121.57 121.57 121.57 42 42 42
d7
mean max min
time
2015-10-14 15:00:00 59.24 59.24 59.24
2015-10-14 16:00:00 59.24 59.24 59.24
[2 rows x 21 columns]
Using pd.read_json
directly seems not working because resulting dataframe has unexpected structure which is hard to use.
Upvotes: 2
Reputation: 830
As you can see "data" is actually an array, look at the open bracket after it. So you would want to go first member of the array, then to time. Since it is truncated I am going to assume that all the members of the array are the same. So to access you would want something like data[0]['time']
Upvotes: 0