Reputation: 37
I have time series data from an API I'd like to get in to a python pandas dataframe.
How would you do this?
The data look like this:
[{'id': 38421212541,
'sensor_id': 12944473,
'value': '6852.426',
'date': '2015-02-05',
'min': '0.0',
'max': '833.789',
'avg': '285.5177',
'values': '[344.336,306.05449999999996,269.922,233.9845,198.63299999999998,164.0625,128.9065,96.2895,66.797,50.391,27.344,10.938,8.203,6.641,4.297,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10.352,23.047,41.797,59.961,93.75,135.938,171.875,209.375,247.266,304.1015,345.703,387.5,438.4765,489.6485,524.414,579.688,627.9300000000001,680.859,722.656,776.563,802.344,811.5235,816.016,822.2655,826.3675000000001,830.2735,830.078,831.836,831.836,833.789,831.25,828.906,828.125,828.3205,827.3435,825,821.875,820.508,817.188,811.7185,810.547,807.422,803.516,801.172,796.094,792.7735,788.672,786.328,783.5935,778.125,775,772.266,768.359,765.625,763.672,759.766,758.203,757.422,753.516,752.734,748.633,751.172,748.047,747.461,747.266,749.6095,747.266,748.828,747.266,748.438,752.9295,750.1949999999999,755.078,756.641,759.961,760.1565,764.4535000000001,767.383,767.188,770.703,774.219,774.6095,780.469,782.8125,785.547,785.156,793.359,794.531,795.8985,798.047,800,805.859,805.469,810.1565,815.4295,811.719,814.453,817.578,814.063,817.188,817.969,813.281,812.5,810.938,786.133,755.859,719.922,676.3675000000001,642.383,606.0550000000001,565.039,534.1795,494.922,448.24199999999996,412.305,379.8825]',
'start_time': None,
'start_value': None},
{'id': 38421212540,
'sensor_id': 12944473,
'value': '6771.2902',
'date': '2015-02-04',
'min': '0.0',
'max': '828.9065',
'avg': '282.1371',
'values': '[333.0075,297.0705,253.32049999999998,211.7185,173.828,157.422,100.391,100,69.5315,36.3285,24.219,13.281,12.891,10.547,6.836,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1.172,17.188,35.547,55.078,78.516,118.75,160.938,192.969,210.156,262.8905,309.766,376.953,416.0155,460.156,511.719,567.578,609.1800000000001,662.5,703.516,747.8515,777.344,792.578,801.3675000000001,812.6955,819.727,825,828.906,822.266,825,825.781,827.344,827.9295,828.9065,825.781,817.578,810.9375,806.25,806.25,803.516,799.0235,796.484,794.336,792.7735,787.6955,784.766,782.422,777.734,775.781,773.438,768.75,764.8435,763.086,763.672,754.883,755.8595,759.766,754.1015,749.4145,746.875,748.2425000000001,744.141,745.313,747.6565,742.969,739.8435,742.7735,742.7735,744.141,748.438,749.0235,740.039,750.781,748.242,751.172,755.664,757.813,764.258,765.625,763.672,766.992,770.8985,772.266,773.2425000000001,781.641,787.5,785.938,789.453,792.188,796.6800000000001,800.3905,798.8285000000001,801.563,802.344,805.859,805.2735,813.2815,808.984,810.156,813.672,812.109,812.6955,814.0625,808.5935,808.008,777.9300000000001,747.0705,707.8125,677.7345,636.7185,602.3435,566.992,529.4925000000001,486.3285,451.172,417.7735,382.8125]',
'start_time': None,
'start_value': None},
{'id': 38421163212,
'sensor_id': 12944473,
'value': '6652.3768',
'date': '2015-02-03',
'min': '0.0',
'max': '807.2265',
'avg': '277.1824',
'values': '[331.44550000000004,297.65599999999995,256.25,220.8985,184.57,149.6095,100.9765,66.9925,31.6405,42.382999999999996,17.969,12.5,8.594,6.641,3.906,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.391,12.6955,20.8985,48.047,78.516,107.6175,150.586,185.9375,222.2655,260.15650000000005,312.8905,350,402.344,436.133,486.719,536.328,584.375,637.1095,675.781,717.969,762.8905,771.4845,778.5155,784.961,790.4295,791.406,797.461,800.7815,802.9295,803.5155,804.8824999999999,805.469,804.6875,803.3205,800,800.1955,798.047,797.8515,795.508,792.578,789.453,787.6955,784.766,779.297,777.9295,775.586,771.094,768.5545,766.016,762.5,758.203,756.25,753.906,750.391,747.656,746.6800000000001,741.797,738.672,736.5235,734.5705,738.086,732.422,732.422,733.594,732.813,732.0315,733.984,734.766,735.3515,736.719,737.6955,738.672,740.234,746.094,744.531,752.9295,751.172,755.859,757.031,761.7185,764.258,766.406,771.094,773.047,776.953,778.125,783.008,783.984,787.891,793.75,795.703,798.047,800.586,799.6095,798.8285000000001,800.9765,806.25,805.664,805.469,805.859,806.25,807.2265,804.883,803.125,772.8515,739.258,698.4375,665.625,624.0235,594.336,551.5625,514.6485,471.875,437.30449999999996,402.539,369.336]',
'start_time': None,
'start_value': None}]
I got that data with:
import requests
import json
# query for data from sensor with id number "12944473"
# populate response 'r' and JSON 'j' with the first page of the response
r = requests.get('https://foo-bar-company.com/sensors/12944473/daily_data', auth=(user, pwd))
j = r.json()
# see how many pages there are.
# loop through all pages, appending to 'j'
last_uri = r.links['last']['url']
while r.url != last_uri:
r = requests.get(r.links['next']['url'], auth=(user, pwd))
j.append(r.json())
# print a few days to share with SO
print(j[0:3:1])
The goal is something like this:
12944473
10/2/2015 0:00 344.336
10/2/2015 0:05 306.0545
10/2/2015 0:10 269.922
10/2/2015 0:15 233.9845
10/2/2015 0:20 198.633
10/2/2015 0:25 164.0625
10/2/2015 0:30 128.9065
10/2/2015 0:35 96.2895
10/2/2015 0:40 66.797
10/2/2015 0:45 50.391
10/2/2015 0:50 27.344
10/2/2015 0:55 10.938
...
I tried to keep this short and to the point, but happy to add more details. The data are sunlight measurements. This question is somewhat unique from other posts I've found because of it being time-series data.
How do you protect against DST and leap year issues when you're building the date-time index?
I'm new to Python and I'm hoping there's elegant solution to this I just haven't seen.
Thanks in advance!
Python 3.7
Pandas 0.25.2
foo bar company's RESTful API
Upvotes: 1
Views: 3163
Reputation: 525
Assuming you have the latest version of pandas.
data = [{'date': x['date'], 'values': eval(x['values'])} for x in your_json_dict]
pd.DataFrame(data).explode('values')
results in
date values
0 2015-02-05 344.336
0 2015-02-05 306.054
0 2015-02-05 269.922
0 2015-02-05 233.984
0 2015-02-05 198.633
.. ... ...
2 2015-02-03 514.649
2 2015-02-03 471.875
2 2015-02-03 437.304
2 2015-02-03 402.539
2 2015-02-03 369.336
[{'date': x['date'], 'values': eval(x['values'])} for x in your_json_dict]
This is a list comprehension. It creates a new list of dictionaries where each dictionary has the keys 'date' and 'values'. It also converts 'values' from a string to a list of numbers.
pd.DataFrame(data).explode('values')
pandas is perfectly fine with accepting a list of dictionaries. The explode function is a new feature in version 0.25 that expands a column that is a list into rows for each element of that list.
Upvotes: 1