Reputation: 7978
I have this JSON string:
{
"2022-09-28T00:45:00.000Z": [
{
"value": 0.216,
"plantId": "27050937",
"man": "pippo"
},
{
"value": 0.224,
"plantId": "30082443",
"man": "pippo"
}
],
"2022-09-28T00:30:00.000Z": [
{
"value": 0.248,
"plantId": "27050937",
"man": "pippo"
},
{
"value": 0.108,
"plantId": "30082443",
"man": "pippo"
}
]
}
I would like to import the data in order to build a numpy matrix indexed as (plantId, time). For example, I would get a matrix like:
300082443 0.216 0.108
27050937 0.224 0.248
....
I tried with pandas:
a =pd.read_json("./data.json",orient='index')
But I get:
2022-09-28 00:45:00+00:00 {'value': 0.216, 'plantId': '27050937', 'man... {'value': 0.224, 'plantId': '30082443', 'man...
2022-09-28 00:30:00+00:00 {'value': 0.248, 'plantId': '27050937', 'man... {'value': 0.108, 'plantId': '30082443', 'man...
Upvotes: 1
Views: 146
Reputation: 863631
Use list comprehension for merge
date
s (outer keys) to nested dictionaries and pass to DataFrame constructor:
import json
with open('./data.json') as data_file:
data = json.load(data_file)
df = pd.DataFrame([{**{'date': k}, **x} for k, v in data.items() for x in v])
print (df)
date value plantId man
0 2022-09-28T00:45:00.000Z 0.216 27050937 pippo
1 2022-09-28T00:45:00.000Z 0.224 30082443 pippo
2 2022-09-28T00:30:00.000Z 0.248 27050937 pippo
3 2022-09-28T00:30:00.000Z 0.108 30082443 pippo
EDIT: Next step if need reshape DataFrame by plantId
and value
columns by GroupBy.cumcount
and DataFrame.pivot
:
df1 = (df.assign(g = df.groupby('plantId').cumcount().add(1))
.pivot('plantId','g','value')
.add_prefix('value'))
print (df1)
g value1 value2
plantId
27050937 0.216 0.248
30082443 0.224 0.108
Or if need pivoting by date
in columns:
df2 = df.pivot('plantId','date','value')
#df2 = df.pivot_table(index='plantId',columns='date',values='value', aggfunc='mean')
print (df2)
date 2022-09-28T00:30:00.000Z 2022-09-28T00:45:00.000Z
plantId
27050937 0.248 0.216
30082443 0.108 0.224
Upvotes: 2
Reputation: 4608
@jezrael's answer is much shorter and more efficient.
Combining the two answers:
df = pd.DataFrame([{**{'date': k}, **x} for k, v in a.items() for x in v]) #jezrael's answer
df2=df.groupby('plantId')['value'].agg(list).reset_index()
df2[['value_1','value_2']] = pd.DataFrame(df2.value.tolist(), index= df2.index)
df2
'''
plantId value value_1 value_2
0 27050937 [0.216, 0.248] 0.216 0.248
1 30082443 [0.224, 0.108] 0.224 0.108
'''
details (long solution):
a={
"2022-09-28T00:45:00.000Z": [
{
"value": 0.216,
"plantId": "27050937",
"man": "pippo"
},
{
"value": 0.224,
"plantId": "30082443",
"man": "pippo"
}
],
"2022-09-28T00:30:00.000Z": [
{
"value": 0.248,
"plantId": "27050937",
"man": "pippo"
},
{
"value": 0.108,
"plantId": "30082443",
"man": "pippo"
}
]
}
df=pd.DataFrame(a)
2022-09-28T00:45:00.000Z 2022-09-28T00:30:00.000Z
0 {'value': 0.216, 'plantId': '27050937', 'man': 'pippo'} {'value': 0.248, 'plantId': '27050937', 'man': 'pippo'}
1 {'value': 0.224, 'plantId': '30082443', 'man': 'pippo'} {'value': 0.108, 'plantId': '30082443', 'man': 'pippo'}
As the data increases, the number of columns will also increase. Now we need to convert these columns to rows.
df = df.T.reset_index() # T = transpose()
We used the transpose function to convert columns to rows.
latest version:
index 0 1
0 2022-09-28T00:45:00.000Z {'value': 0.216, 'plantId': '27050937', 'man': 'pippo'} {'value': 0.224, 'plantId': '30082443', 'man': 'pippo'}
1 2022-09-28T00:30:00.000Z {'value': 0.248, 'plantId': '27050937', 'man': 'pippo'} {'value': 0.108, 'plantId': '30082443', 'man': 'pippo'}
now let's parse the dictonary objects (columns 0 and 1) in the rows into new columns. We will use the json_normalize function for this.
first= pd.json_normalize(df.pop(0))
df=df.join(first)
second=pd.json_normalize(df.pop(1))
df = pd.concat([df,second])
print(df)
'''
index value plantId man
0 2022-09-28T00:45:00.000Z 0.216 27050937 pippo
1 2022-09-28T00:30:00.000Z 0.248 27050937 pippo
0 NaN 0.224 30082443 pippo
1 NaN 0.108 30082443 pippo
'''
#if you want to fill NaNs use:
df = df.fillna({'index':df[['index']][df['index'].notnull()].squeeze()})
#fill nans with same index number.
'''
index value plantId man
0 2022-09-28T00:45:00.000Z 0.216 27050937 pippo
1 2022-09-28T00:30:00.000Z 0.248 27050937 pippo
0 2022-09-28T00:45:00.000Z 0.224 30082443 pippo
1 2022-09-28T00:30:00.000Z 0.108 30082443 pippo
'''
json_normalize returns only the processed column values. But we also need other columns so we use join function. Then we should do the same operation in the column named 1. but here I am using concat instead of join because the column names are the same and I want the data to be inserted one below the other (as rows not as columns). finally group by plant_id and add the values to the list then split the list items into new columns.
df2=df.groupby('plantId')['value'].agg(list).reset_index()
df2[['value_1','value_2']] = pd.DataFrame(df2.value.tolist(), index= df2.index)
print(df2)
'''
plantId value value_1 value_2
0 27050937 [0.216, 0.248] 0.216 0.248
1 30082443 [0.224, 0.108] 0.224 0.108
'''
Upvotes: 1