user3264280
user3264280

Reputation: 19

Reading from JSON file in Python and writing to CSV

Previously, I read from a CSV file and got the min, max and average of my data in the CSV file. I'm trying to read the same data from a JSON file, and write output to CSV, but I'm not understanding how to do it. Any help is greatly appreciated. My JSON file is as follows:

{
"data": [
{
"time": "2015-10-14 15:01:10",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 19,
"d4": 6.21,
"d4": 105.99,
"d5": 42,
"d6": 59.24
}
},
{
"time": "2015-10-14 15:01:20",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 1,
"d4": 0.81,
"d5": 121.57,
"d6": 42,
"d7": 59.24
} .. ..

The code that I've so far is:

df = pd.read_json('data.json', convert_dates = True)  
df['time'] = [pd.to_datetime(d) for d in df['time']]  
df = df.set_index('time') 
hourly_stats = d.groupby(pd.TimeGrouper('H'))  
print((hourly_stats).agg([np.mean, np.min, np.max]))  
((hourly_stats).agg([np.mean, np.min, np.max])).to_csv('file.csv')

Upvotes: 1

Views: 561

Answers (4)

snakes_on_a_keyboard
snakes_on_a_keyboard

Reputation: 884

Well, your actual code and your description of what you're trying to do seem a bit different. Hopefully this will help a bit, all you need to do is redefine the headers and stick your business logic in the "json_to_dict" function and you should be good to go.

import json
import csv


def to_csv(json_obj, fname='my_csv.csv'):
    with open(fname, 'w') as f:
        to_write = json_to_writable_dict(json_obj)

        fieldnames = ['time'] + ['d{}'.format(i) for i in range(1, 8)]
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        for row in to_write:
            writer.writerow(row)

    return fname

def json_to_writable_dict(json_obj):
    data, values, time = 'data', 'values', 'time'
    json_dict = dict(json_obj)
    to_write = []
    for item in json_dict[data]:
        row = {'d{}'.format(i): item[values]['d{}'.format(i)] for i in range(1, 8)}
        row.update({'time': item[time]})
        to_write.append(row)
    return to_write

def main():
    s = '''{
"data": [
{
  "time": "2015-10-14 15:01:10",
  "values": {
    "d1": 3956.58,
    "d2": 0,
    "d3": 19,
    "d4": 6.21,
    "d5": 105.99,
    "d6": 42,
    "d7": 59.24
  }
},
{
  "time": "2015-10-14 15:01:20",
  "values": {
    "d1": 3956.58,
    "d2": 0,
    "d3": 1,
    "d4": 0.81,
    "d5": 121.57,
    "d6": 42,
    "d7": 59.24
  }
}
]
}'''

    json_thing = json.loads(s)
    csv_name = to_csv(json_obj=json_thing)

    with open(csv_name) as f:
        print(f.read())

if __name__ == '__main__':
    main()

Upvotes: 0

Prahlad Yeri
Prahlad Yeri

Reputation: 3663

First of all, your JSON is incorrect. Correct it, and Validate it before use. After that, you can do something like this to get the data in python:

    import json
    fp =open('/path/tp/my/file')
    mystr = fp.read()
    fp.close()
    data = json.loads(mystr)

Upvotes: 2

Happy001
Happy001

Reputation: 6383

I slightly modified your JSON string and added one more record to have diferent 'Hour' groups.

import pandas as pd
import numpy as np
import json

jsondata = '''{
"data": [
{
"time": "2015-10-14 15:01:10",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 19,
"d4": 6.21,
"d5": 105.99,
"d6": 42,
"d7": 59.24
}
},
{
"time": "2015-10-14 15:01:20",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 1,
"d4": 0.81,
"d5": 121.57,
"d6": 42,
"d7": 59.24
}
},
{
"time": "2015-10-14 16:01:20",
"values": {
"d1": 31956.58,
"d2": 0,
"d3": 1,
"d4": 0.81,
"d5": 121.57,
"d6": 42,
"d7": 59.24
}
}
]
}
'''

data = json.loads(jsondata)['data']
#If your JSON data is in a file, then do:
#data = json.load(jsonfile)['data']  

df = pd.DataFrame(data=[record['values'] for record in data], 
                  index=pd.DatetimeIndex([record['time'] for record in data], name='time'))


print df

print df.groupby(pd.Grouper(freq='H')).agg([np.mean, max, min])

Output(df):

                           d1  d2  d3    d4      d5  d6     d7
time                                                          
2015-10-14 15:01:10   3956.58   0  19  6.21  105.99  42  59.24
2015-10-14 15:01:20   3956.58   0   1  0.81  121.57  42  59.24
2015-10-14 16:01:20  31956.58   0   1  0.81  121.57  42  59.24

Output statistics:

                           d1                       d2           d3          \
                         mean       max       min mean max min mean max min   
time                                                                          
2015-10-14 15:00:00   3956.58   3956.58   3956.58    0   0   0   10  19   1   
2015-10-14 16:00:00  31956.58  31956.58  31956.58    0   0   0    1   1   1   

                       d4  ...              d5                   d6          \
                     mean  ...     min    mean     max     min mean max min   
time                       ...                                                
2015-10-14 15:00:00  3.51  ...    0.81  113.78  121.57  105.99   42  42  42   
2015-10-14 16:00:00  0.81  ...    0.81  121.57  121.57  121.57   42  42  42   

                        d7                
                      mean    max    min  
time                                      
2015-10-14 15:00:00  59.24  59.24  59.24  
2015-10-14 16:00:00  59.24  59.24  59.24  

[2 rows x 21 columns]

Using pd.read_json directly seems not working because resulting dataframe has unexpected structure which is hard to use.

Upvotes: 2

Wissam Youssef
Wissam Youssef

Reputation: 830

As you can see "data" is actually an array, look at the open bracket after it. So you would want to go first member of the array, then to time. Since it is truncated I am going to assume that all the members of the array are the same. So to access you would want something like data[0]['time']

Upvotes: 0

Related Questions