Creating a json structure from dataframe

Question

I have two excel files:

2020-01-consumption.xlsx
2020-01-production.xlsx

print(pd.read_excel(2020-01-production.xlsx, index_col = 0).head(4))


                         build_1   build_2               build_3       build_4  ...  
date                                                                            ...
2020-01-01 00:00:00          1.2       4.2                   4.3           7.0  ...
2020-01-01 01:00:00          3.3       1.9                   5.3           3.5  ...
2020-01-01 02:00:00          4.1       2.7                   6.0           2.6  ...
2020-01-01 03:00:00          3.6       6.0                   7.1           7.2  ...



print(pd.read_excel(2020-01-consumption.xlsx, index_col = 0).head(4))


                         build_1   build_2               build_3       build_4  ...  
date                                                                            ...
2020-01-01 00:00:00          0.4       1.0                   0.1           1.0  ...
2020-01-01 01:00:00          0.3       0.9                   0.0           0.4  ...
2020-01-01 02:00:00          0.3       0.5                   0.0           0.4  ...
2020-01-01 03:00:00          0.1       0.5                   0.4           0.4  ...

Columns and indexes are the same. I'm trying to set up a for loop. So under these circumstances, I want to save each column as a json file. I want to change the structure of data to this:

with open(build_1.json, encoding="utf8") as f:  #The name of the new file to be created must be the column name.
    content = json.load(f)

print(content)

{'build_1': {  #The key is column name.
    'date': [2020-01-01 00:00:00, 2020-01-01 01:00:00, 2020-01-01 02:00:00, 2020-01-01 03:00:00 ...],  #index name as a key.
    'production': [1.2, 3.3, 4.1, 3.6 ...],  #excel name is changed as a key.
    'consumption': [0.4, 0.3, 0.3, 0.1 ...]}}  #excel name is changed as a key.

I have a lot of dataframe like production and consumption. I want to show only two as an example. How can I achieve this structure? Is this possible?

jezrael · Accepted Answer

You can use concat with keys parameter for MultiIndex in columns:

df1 = pd.read_excel('2020-01-production.xlsx', index_col = 0)
df2 = pd.read_excel('2020-01-consumption.xlsx', index_col = 0)

df = pd.concat([df1, df2], keys=['production','consumption'], axis=1)
print (df)

                    production                         consumption          \
                       build_1 build_2 build_3 build_4     build_1 build_2   
date                                                                         
2020-01-01 00:00:00        1.2     4.2     4.3     7.0         0.4     1.0   
2020-01-01 01:00:00        3.3     1.9     5.3     3.5         0.3     0.9   
2020-01-01 02:00:00        4.1     2.7     6.0     2.6         0.3     0.5   
2020-01-01 03:00:00        3.6     6.0     7.1     7.2         0.1     0.5   

                                     
                    build_3 build_4  
date                                 
2020-01-01 00:00:00     0.1     1.0  
2020-01-01 01:00:00     0.0     0.4  
2020-01-01 02:00:00     0.0     0.4  
2020-01-01 03:00:00     0.4     0.4

And then loop by second level, selecting by DataFrame.xs, if necessary convert datetimes to strings or soem another way like need, create dictionary and last write to file:

for lvl in df.columns.levels[1]:
    print (lvl)
    df1 = df.xs(lvl, axis=1, level=1).reset_index()
    df1['date'] = df1['date'].astype(str)
    d ={lvl: df1.to_dict(orient='list')}
    #print (d)
    
    with open(f'{lvl}.json', mode='w', encoding="utf8") as f:
        json.dump(d, f)

Creating a json structure from dataframe

Answers (1)

Related Questions