Reputation: 851
i have the following dataframe (i have simplified it)
Column0 Column1 Type
Asset Code
R0083TX3P3PATX999 0.00 0.00 variable_name_1
R0084TX3P3WTXNM99 55.74 55.74 variable_name_1
R0087KY2P2KY99999 265.35 265.35 variable_name_1
T7001OK2P2OK99999 0.00 0.00 variable_name_2
T7029LA3P3SLA9999 0.00 0.00 variable_name_2
T7032CA5P5SW99999 0.00 0.00 variable_name_2
T7001OK2P2OK99999 0.00 0.00 variable_name_3
T7029LA3P3SLA9999 9.00 9.00 variable_name_3
T7032CA5P5SW99999 14.00 14.00 variable_name_3
it is actually 3 different dataframes that i have joined together after adding the 'type' column so i could tell where it came from
my end goal is to create a nested json that at the top has the "ColumnName" as the key, and then second has the "Asset Code" as the next key, which has a third nested object which is just the associated variables and values.
the intent of this would be that the json could be accessed by
data['Column0']['Asset-Code']
and the list of values would be present
the output would be a json object with this format:
{
"Column1": {
"R0083TX3P3PATX999": {
"variable_name_1": 0
},
"R0084TX3P3WTXNM99": {
"variable_name_1": 55.74
},
"R0087KY2P2KY99999": {
"variable_name_1": 265.35
},
"T7001OK2P2OK99999": {
"variable_name_2": 0,
"variable_name_3": 0
},
"T7029LA3P3SLA9999": {
"variable_name_2": 0,
"variable_name_3": 9.0
},
"T7032CA5P5SW99999": {
"variable_name_2": 0,
"variable_name_3": 14
}
},
"Column2": {
"R0083TX3P3PATX999": {
"variable_name_1": 0
},
"R0084TX3P3WTXNM99": {
"variable_name_1": 55.74
},
"R0087KY2P2KY99999": {
"variable_name_1": 265.35
},
"T7001OK2P2OK99999": {
"variable_name_2": 2,
"variable_name_3": 3
},
"T7029LA3P3SLA9999": {
"variable_name_2": 2,
"variable_name_3": 9.0
},
"T7032CA5P5SW99999": {
"variable_name_2": 0,
"variable_name_3": 14
}
}
}
I am unsure how to do this, does this mean my new master dataframe that contains all of them needs to be reindexed (or multi indexed?) i was looking at groupby functions as well but was unsure how to apply them, specifically because of that third nested object. initially before that it was very easy because i was just exporting the to_json
with orient=columns
and it worked perfectly, but only for the two levels of data.
Upvotes: 1
Views: 73
Reputation: 3752
One way you can do it is by loop through the columns and transform to dict as an index
:
res = {col: df.pivot('Asset Code','Type', col)\
.replace({np.nan:None})\
.to_dict(orient='index') for col in 'Column1','Column2']}
Output
{'Column1': {'R0083TX3P3PATX999': {'variable_name_1': 0.0,
'variable_name_2': None,
'variable_name_3': None},
'R0084TX3P3WTXNM99': {'variable_name_1': 55.74,
'variable_name_2': None,
'variable_name_3': None},
'R0087KY2P2KY99999': {'variable_name_1': 265.35,
'variable_name_2': None,
'variable_name_3': None},
'T7001OK2P2OK99999': {'variable_name_1': None,
'variable_name_2': 0.0,
'variable_name_3': 0.0},
'T7029LA3P3SLA9999': {'variable_name_1': None,
'variable_name_2': 0.0,
'variable_name_3': 9.0},
'T7032CA5P5SW99999': {'variable_name_1': None,
'variable_name_2': 0.0,
'variable_name_3': 14.0}},
'Column2': {'R0083TX3P3PATX999': {'variable_name_1': 0.0,
'variable_name_2': None,
'variable_name_3': None},
'R0084TX3P3WTXNM99': {'variable_name_1': 55.74,
'variable_name_2': None,
'variable_name_3': None},
'R0087KY2P2KY99999': {'variable_name_1': 265.35,
'variable_name_2': None,
'variable_name_3': None},
'T7001OK2P2OK99999': {'variable_name_1': None,
'variable_name_2': 0.0,
'variable_name_3': 0.0},
'T7029LA3P3SLA9999': {'variable_name_1': None,
'variable_name_2': 0.0,
'variable_name_3': 9.0},
'T7032CA5P5SW99999': {'variable_name_1': None,
'variable_name_2': 0.0,
'variable_name_3': 14.0}}}
Below the code I used to create the df
:
df = pd.DataFrame([
['R0083TX3P3PATX999', 0.00, 0.00, 'variable_name_1'],
['R0084TX3P3WTXNM99', 55.74 , 55.74, 'variable_name_1'],
['R0087KY2P2KY99999', 265.35 , 265.35, 'variable_name_1'],
['T7001OK2P2OK99999', 0.00 , 0.00, 'variable_name_2'],
['T7029LA3P3SLA9999', 0.00 , 0.00, 'variable_name_2'],
['T7032CA5P5SW99999', 0.00 , 0.00, 'variable_name_2'],
['T7001OK2P2OK99999', 0.00 , 0.00, 'variable_name_3'],
['T7029LA3P3SLA9999', 9.00 , 9.00, 'variable_name_3'],
['T7032CA5P5SW99999', 14.00 , 14.00, 'variable_name_3'],
], columns = ['Asset Code','Column1','Column2', 'Type'])
Upvotes: 2