Reputation: 767
I have json file i am trying to parse out and get just two values out. 'timestamps' and 'values'. Here is the sample of my json file:
{'totalCount': 1,
'nextPageKey': None,
'result': [{'metricId': 'builtin:synthetic.browser.event.visuallyComplete.load',
'data': [{'dimensions': ['SYNTHETIC_TEST_STEP-123456'],
'dimensionMap': {'dt.entity.synthetic_test_step': 'SYNTHETIC_TEST_STEP-123456'},
'timestamps': [
1596326400000,
1616976000000,
1617062400000,
1617148800000,
1617235200000],
'values': [
3880.834490740741,
3879.6458333333335,
3826.3645833333335,
3890.871527777778,
3876.8199643493763]}]}]}
I tried using approach with pandas:
import pandas as pd
pdp = pd.json_normalize(PDPjson['result'],['data'])
so I can get it into data frame but I am getting column with list of dates and column with list of values.
What I am trying to get is just a dataframe with two columns, Timestamp and Values.
How do I do this from here?
Upvotes: 1
Views: 92
Reputation: 30971
I created your JSON object as follows:
txt = '''\
{ "totalCount": 1,
"nextPageKey": "None",
"result": [{"metricId": "builtin:synthetic.browser.event.visuallyComplete.load",
"data": [{"dimensions": ["SYNTHETIC_TEST_STEP-123456"],
"dimensionMap": {"dt.entity.synthetic_test_step": "SYNTHETIC_TEST_STEP-123456"},
"timestamps": [ 1596326400000, 1616976000000, 1617062400000,
1617148800000, 1617235200000],
"values": [ 3880.834490740741, 3879.6458333333335, 3826.3645833333335,
3890.871527777778, 3876.8199643493763]}]}]}'''
PDPjson = json.loads(txt)
Then I started from json_normalize just as you did:
pdp = pd.json_normalize(PDPjson['result'], 'data')
The missing step is:
result = pdp[['timestamps', 'values']].apply(lambda col: col.explode())\
.reset_index(drop=True)
The result is:
timestamps values
0 1596326400000 3880.83
1 1616976000000 3879.65
2 1617062400000 3826.36
3 1617148800000 3890.87
4 1617235200000 3876.82
Don't worry about smaller precision of values column, it is only the way how Jupyter Notebook presents the data.
When you run e.g. result.iloc[0,1]
you will get:
3880.834490740741
with full precision.
Upvotes: 1
Reputation: 41327
You can apply Series.explode
on the timestamps
and values
:
pdp = pdp[['timestamps', 'values']].apply(pd.Series.explode).reset_index(drop=True)
# timestamps values
# 0 1596326400000 3880.834491
# 1 1616976000000 3879.645833
# 2 1617062400000 3826.364583
# 3 1617148800000 3890.871528
# 4 1617235200000 3876.819964
Upvotes: 1