Pandas Read Nested Json Data

Question

I have this JSON string:

{
    "2022-09-28T00:45:00.000Z": [
        {
            "value": 0.216,
            "plantId": "27050937",
            "man": "pippo"
        },
        {
            "value": 0.224,
            "plantId": "30082443",
            "man": "pippo"
        }
    ],
    "2022-09-28T00:30:00.000Z": [
        {
            "value": 0.248,
            "plantId": "27050937",
            "man": "pippo"
        },
        {
            "value": 0.108,
            "plantId": "30082443",
            "man": "pippo"
        }
    ]
}

I would like to import the data in order to build a numpy matrix indexed as (plantId, time). For example, I would get a matrix like:

  300082443 0.216 0.108
  27050937  0.224 0.248
  ....

I tried with pandas:

a =pd.read_json("./data.json",orient='index')

But I get:

2022-09-28 00:45:00+00:00  {'value': 0.216, 'plantId': '27050937', 'man...  {'value': 0.224, 'plantId': '30082443', 'man...
2022-09-28 00:30:00+00:00  {'value': 0.248, 'plantId': '27050937', 'man...  {'value': 0.108, 'plantId': '30082443', 'man...

Bushmaster · Accepted Answer

@jezrael's answer is much shorter and more efficient.

Combining the two answers:

df = pd.DataFrame([{**{'date': k}, **x} for k, v in a.items() for x in v]) #jezrael's answer

df2=df.groupby('plantId')['value'].agg(list).reset_index()
df2[['value_1','value_2']] = pd.DataFrame(df2.value.tolist(), index= df2.index)
df2
'''
plantId           value  value_1  value_2
0  27050937  [0.216, 0.248]    0.216    0.248
1  30082443  [0.224, 0.108]    0.224    0.108
'''

details (long solution):

a={
    "2022-09-28T00:45:00.000Z": [
        {
            "value": 0.216,
            "plantId": "27050937",
            "man": "pippo"
        },
        {
            "value": 0.224,
            "plantId": "30082443",
            "man": "pippo"
        }
    ],
    "2022-09-28T00:30:00.000Z": [
        {
            "value": 0.248,
            "plantId": "27050937",
            "man": "pippo"
        },
        {
            "value": 0.108,
            "plantId": "30082443",
            "man": "pippo"
        }
    ]
}

df=pd.DataFrame(a)

df:

    2022-09-28T00:45:00.000Z                                2022-09-28T00:30:00.000Z
0   {'value': 0.216, 'plantId': '27050937', 'man': 'pippo'} {'value': 0.248, 'plantId': '27050937', 'man': 'pippo'}
1   {'value': 0.224, 'plantId': '30082443', 'man': 'pippo'} {'value': 0.108, 'plantId': '30082443', 'man': 'pippo'}

As the data increases, the number of columns will also increase. Now we need to convert these columns to rows.

df = df.T.reset_index() # T = transpose()

We used the transpose function to convert columns to rows. latest version:

        index                       0                                       1
0       2022-09-28T00:45:00.000Z    {'value': 0.216, 'plantId': '27050937', 'man': 'pippo'} {'value': 0.224, 'plantId': '30082443', 'man': 'pippo'}
1       2022-09-28T00:30:00.000Z    {'value': 0.248, 'plantId': '27050937', 'man': 'pippo'} {'value': 0.108, 'plantId': '30082443', 'man': 'pippo'}

now let's parse the dictonary objects (columns 0 and 1) in the rows into new columns. We will use the json_normalize function for this.

first= pd.json_normalize(df.pop(0))
df=df.join(first)
second=pd.json_normalize(df.pop(1))
df = pd.concat([df,second])
print(df)
'''
                      index  value   plantId    man
0  2022-09-28T00:45:00.000Z  0.216  27050937  pippo
1  2022-09-28T00:30:00.000Z  0.248  27050937  pippo
0                       NaN  0.224  30082443  pippo
1                       NaN  0.108  30082443  pippo

'''
#if you want to fill NaNs use:
df = df.fillna({'index':df[['index']][df['index'].notnull()].squeeze()})
#fill nans with same index number.
'''
                      index  value   plantId    man
0  2022-09-28T00:45:00.000Z  0.216  27050937  pippo
1  2022-09-28T00:30:00.000Z  0.248  27050937  pippo
0  2022-09-28T00:45:00.000Z  0.224  30082443  pippo
1  2022-09-28T00:30:00.000Z  0.108  30082443  pippo
'''

json_normalize returns only the processed column values. But we also need other columns so we use join function. Then we should do the same operation in the column named 1. but here I am using concat instead of join because the column names are the same and I want the data to be inserted one below the other (as rows not as columns). finally group by plant_id and add the values to the list then split the list items into new columns.

df2=df.groupby('plantId')['value'].agg(list).reset_index()
df2[['value_1','value_2']] = pd.DataFrame(df2.value.tolist(), index= df2.index)
print(df2)
'''
    plantId           value  value_1  value_2
0  27050937  [0.216, 0.248]    0.216    0.248
1  30082443  [0.224, 0.108]    0.224    0.108
'''

Pandas Read Nested Json Data

Answers (2)

Related Questions