nice_name
nice_name

Reputation: 37

How do I get this JSON time series data into a pandas dataframe?

I have time series data from an API I'd like to get in to a python pandas dataframe.
How would you do this?

The data look like this:

[{'id': 38421212541,
  'sensor_id': 12944473,
  'value': '6852.426',
  'date': '2015-02-05',
  'min': '0.0',
  'max': '833.789',
  'avg': '285.5177',
  'values': '[344.336,306.05449999999996,269.922,233.9845,198.63299999999998,164.0625,128.9065,96.2895,66.797,50.391,27.344,10.938,8.203,6.641,4.297,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10.352,23.047,41.797,59.961,93.75,135.938,171.875,209.375,247.266,304.1015,345.703,387.5,438.4765,489.6485,524.414,579.688,627.9300000000001,680.859,722.656,776.563,802.344,811.5235,816.016,822.2655,826.3675000000001,830.2735,830.078,831.836,831.836,833.789,831.25,828.906,828.125,828.3205,827.3435,825,821.875,820.508,817.188,811.7185,810.547,807.422,803.516,801.172,796.094,792.7735,788.672,786.328,783.5935,778.125,775,772.266,768.359,765.625,763.672,759.766,758.203,757.422,753.516,752.734,748.633,751.172,748.047,747.461,747.266,749.6095,747.266,748.828,747.266,748.438,752.9295,750.1949999999999,755.078,756.641,759.961,760.1565,764.4535000000001,767.383,767.188,770.703,774.219,774.6095,780.469,782.8125,785.547,785.156,793.359,794.531,795.8985,798.047,800,805.859,805.469,810.1565,815.4295,811.719,814.453,817.578,814.063,817.188,817.969,813.281,812.5,810.938,786.133,755.859,719.922,676.3675000000001,642.383,606.0550000000001,565.039,534.1795,494.922,448.24199999999996,412.305,379.8825]',
  'start_time': None,
  'start_value': None},
 {'id': 38421212540,
  'sensor_id': 12944473,
  'value': '6771.2902',
  'date': '2015-02-04',
  'min': '0.0',
  'max': '828.9065',
  'avg': '282.1371',
  'values': '[333.0075,297.0705,253.32049999999998,211.7185,173.828,157.422,100.391,100,69.5315,36.3285,24.219,13.281,12.891,10.547,6.836,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1.172,17.188,35.547,55.078,78.516,118.75,160.938,192.969,210.156,262.8905,309.766,376.953,416.0155,460.156,511.719,567.578,609.1800000000001,662.5,703.516,747.8515,777.344,792.578,801.3675000000001,812.6955,819.727,825,828.906,822.266,825,825.781,827.344,827.9295,828.9065,825.781,817.578,810.9375,806.25,806.25,803.516,799.0235,796.484,794.336,792.7735,787.6955,784.766,782.422,777.734,775.781,773.438,768.75,764.8435,763.086,763.672,754.883,755.8595,759.766,754.1015,749.4145,746.875,748.2425000000001,744.141,745.313,747.6565,742.969,739.8435,742.7735,742.7735,744.141,748.438,749.0235,740.039,750.781,748.242,751.172,755.664,757.813,764.258,765.625,763.672,766.992,770.8985,772.266,773.2425000000001,781.641,787.5,785.938,789.453,792.188,796.6800000000001,800.3905,798.8285000000001,801.563,802.344,805.859,805.2735,813.2815,808.984,810.156,813.672,812.109,812.6955,814.0625,808.5935,808.008,777.9300000000001,747.0705,707.8125,677.7345,636.7185,602.3435,566.992,529.4925000000001,486.3285,451.172,417.7735,382.8125]',
  'start_time': None,
  'start_value': None},
 {'id': 38421163212,
  'sensor_id': 12944473,
  'value': '6652.3768',
  'date': '2015-02-03',
  'min': '0.0',
  'max': '807.2265',
  'avg': '277.1824',
  'values': '[331.44550000000004,297.65599999999995,256.25,220.8985,184.57,149.6095,100.9765,66.9925,31.6405,42.382999999999996,17.969,12.5,8.594,6.641,3.906,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.391,12.6955,20.8985,48.047,78.516,107.6175,150.586,185.9375,222.2655,260.15650000000005,312.8905,350,402.344,436.133,486.719,536.328,584.375,637.1095,675.781,717.969,762.8905,771.4845,778.5155,784.961,790.4295,791.406,797.461,800.7815,802.9295,803.5155,804.8824999999999,805.469,804.6875,803.3205,800,800.1955,798.047,797.8515,795.508,792.578,789.453,787.6955,784.766,779.297,777.9295,775.586,771.094,768.5545,766.016,762.5,758.203,756.25,753.906,750.391,747.656,746.6800000000001,741.797,738.672,736.5235,734.5705,738.086,732.422,732.422,733.594,732.813,732.0315,733.984,734.766,735.3515,736.719,737.6955,738.672,740.234,746.094,744.531,752.9295,751.172,755.859,757.031,761.7185,764.258,766.406,771.094,773.047,776.953,778.125,783.008,783.984,787.891,793.75,795.703,798.047,800.586,799.6095,798.8285000000001,800.9765,806.25,805.664,805.469,805.859,806.25,807.2265,804.883,803.125,772.8515,739.258,698.4375,665.625,624.0235,594.336,551.5625,514.6485,471.875,437.30449999999996,402.539,369.336]',
  'start_time': None,
  'start_value': None}]

I got that data with:

import requests
import json

# query for data from sensor with id number "12944473"
# populate response 'r' and JSON 'j' with the first page of the response
r = requests.get('https://foo-bar-company.com/sensors/12944473/daily_data', auth=(user, pwd))
j = r.json()

# see how many pages there are.
# loop through all pages, appending to 'j'
last_uri = r.links['last']['url']

while r.url != last_uri:
    r = requests.get(r.links['next']['url'], auth=(user, pwd))
    j.append(r.json())

# print a few days to share with SO
print(j[0:3:1])

The goal is something like this:

                12944473
10/2/2015 0:00  344.336
10/2/2015 0:05  306.0545
10/2/2015 0:10  269.922
10/2/2015 0:15  233.9845
10/2/2015 0:20  198.633
10/2/2015 0:25  164.0625
10/2/2015 0:30  128.9065
10/2/2015 0:35  96.2895
10/2/2015 0:40  66.797
10/2/2015 0:45  50.391
10/2/2015 0:50  27.344
10/2/2015 0:55  10.938
...

I tried to keep this short and to the point, but happy to add more details. The data are sunlight measurements. This question is somewhat unique from other posts I've found because of it being time-series data.

How do you protect against DST and leap year issues when you're building the date-time index?

I'm new to Python and I'm hoping there's elegant solution to this I just haven't seen.
Thanks in advance!


Python 3.7
Pandas 0.25.2
foo bar company's RESTful API

Upvotes: 1

Views: 3163

Answers (1)

Dave Fol
Dave Fol

Reputation: 525

Quick Answer

Assuming you have the latest version of pandas.

data = [{'date': x['date'], 'values': eval(x['values'])} for x in your_json_dict]
pd.DataFrame(data).explode('values')

results in

          date   values
0   2015-02-05  344.336
0   2015-02-05  306.054
0   2015-02-05  269.922
0   2015-02-05  233.984
0   2015-02-05  198.633
..         ...      ...
2   2015-02-03  514.649
2   2015-02-03  471.875
2   2015-02-03  437.304
2   2015-02-03  402.539
2   2015-02-03  369.336

Explanation

[{'date': x['date'], 'values': eval(x['values'])} for x in your_json_dict]

This is a list comprehension. It creates a new list of dictionaries where each dictionary has the keys 'date' and 'values'. It also converts 'values' from a string to a list of numbers.

pd.DataFrame(data).explode('values')

pandas is perfectly fine with accepting a list of dictionaries. The explode function is a new feature in version 0.25 that expands a column that is a list into rows for each element of that list.

Upvotes: 1

Related Questions