Reputation: 267
I have two excel files:
2020-01-consumption.xlsx
2020-01-production.xlsx
print(pd.read_excel(2020-01-production.xlsx, index_col = 0).head(4))
build_1 build_2 build_3 build_4 ...
date ...
2020-01-01 00:00:00 1.2 4.2 4.3 7.0 ...
2020-01-01 01:00:00 3.3 1.9 5.3 3.5 ...
2020-01-01 02:00:00 4.1 2.7 6.0 2.6 ...
2020-01-01 03:00:00 3.6 6.0 7.1 7.2 ...
print(pd.read_excel(2020-01-consumption.xlsx, index_col = 0).head(4))
build_1 build_2 build_3 build_4 ...
date ...
2020-01-01 00:00:00 0.4 1.0 0.1 1.0 ...
2020-01-01 01:00:00 0.3 0.9 0.0 0.4 ...
2020-01-01 02:00:00 0.3 0.5 0.0 0.4 ...
2020-01-01 03:00:00 0.1 0.5 0.4 0.4 ...
Columns and indexes are the same. I'm trying to set up a for loop. So under these circumstances, I want to save each column as a json file. I want to change the structure of data to this:
with open(build_1.json, encoding="utf8") as f: #The name of the new file to be created must be the column name.
content = json.load(f)
print(content)
{'build_1': { #The key is column name.
'date': [2020-01-01 00:00:00, 2020-01-01 01:00:00, 2020-01-01 02:00:00, 2020-01-01 03:00:00 ...], #index name as a key.
'production': [1.2, 3.3, 4.1, 3.6 ...], #excel name is changed as a key.
'consumption': [0.4, 0.3, 0.3, 0.1 ...]}} #excel name is changed as a key.
I have a lot of dataframe like production and consumption. I want to show only two as an example. How can I achieve this structure? Is this possible?
Upvotes: 1
Views: 23
Reputation: 863301
You can use concat
with keys parameter for MultiIndex in columns
:
df1 = pd.read_excel('2020-01-production.xlsx', index_col = 0)
df2 = pd.read_excel('2020-01-consumption.xlsx', index_col = 0)
df = pd.concat([df1, df2], keys=['production','consumption'], axis=1)
print (df)
production consumption \
build_1 build_2 build_3 build_4 build_1 build_2
date
2020-01-01 00:00:00 1.2 4.2 4.3 7.0 0.4 1.0
2020-01-01 01:00:00 3.3 1.9 5.3 3.5 0.3 0.9
2020-01-01 02:00:00 4.1 2.7 6.0 2.6 0.3 0.5
2020-01-01 03:00:00 3.6 6.0 7.1 7.2 0.1 0.5
build_3 build_4
date
2020-01-01 00:00:00 0.1 1.0
2020-01-01 01:00:00 0.0 0.4
2020-01-01 02:00:00 0.0 0.4
2020-01-01 03:00:00 0.4 0.4
And then loop by second level, selecting by DataFrame.xs
, if necessary convert datetimes to strings or soem another way like need, create dictionary and last write to file:
for lvl in df.columns.levels[1]:
print (lvl)
df1 = df.xs(lvl, axis=1, level=1).reset_index()
df1['date'] = df1['date'].astype(str)
d ={lvl: df1.to_dict(orient='list')}
#print (d)
with open(f'{lvl}.json', mode='w', encoding="utf8") as f:
json.dump(d, f)
Upvotes: 1