Josh
Josh

Reputation: 851

Pandas grouping dataframe and creating nested json

i have the following dataframe (i have simplified it)

                  Column0 Column1             Type
Asset Code
R0083TX3P3PATX999    0.00    0.00  variable_name_1
R0084TX3P3WTXNM99   55.74   55.74  variable_name_1
R0087KY2P2KY99999  265.35  265.35  variable_name_1
T7001OK2P2OK99999    0.00    0.00  variable_name_2
T7029LA3P3SLA9999    0.00    0.00  variable_name_2
T7032CA5P5SW99999    0.00    0.00  variable_name_2
T7001OK2P2OK99999    0.00    0.00  variable_name_3
T7029LA3P3SLA9999    9.00    9.00  variable_name_3
T7032CA5P5SW99999   14.00   14.00  variable_name_3

it is actually 3 different dataframes that i have joined together after adding the 'type' column so i could tell where it came from

my end goal is to create a nested json that at the top has the "ColumnName" as the key, and then second has the "Asset Code" as the next key, which has a third nested object which is just the associated variables and values.

the intent of this would be that the json could be accessed by data['Column0']['Asset-Code'] and the list of values would be present

the output would be a json object with this format:

{
  "Column1": {
    "R0083TX3P3PATX999": {
      "variable_name_1": 0
    },
    "R0084TX3P3WTXNM99": {
      "variable_name_1": 55.74
    },
    "R0087KY2P2KY99999": {
      "variable_name_1": 265.35
    },
    "T7001OK2P2OK99999": {
      "variable_name_2": 0,
      "variable_name_3": 0
    },
    "T7029LA3P3SLA9999": {
      "variable_name_2": 0,
      "variable_name_3": 9.0
    },
    "T7032CA5P5SW99999": {
      "variable_name_2": 0,
      "variable_name_3": 14
    }
  },
  "Column2": {
    "R0083TX3P3PATX999": {
      "variable_name_1": 0
    },
    "R0084TX3P3WTXNM99": {
      "variable_name_1": 55.74
    },
    "R0087KY2P2KY99999": {
      "variable_name_1": 265.35
    },
    "T7001OK2P2OK99999": {
      "variable_name_2": 2,
      "variable_name_3": 3
    },
    "T7029LA3P3SLA9999": {
      "variable_name_2": 2,
      "variable_name_3": 9.0
    },
    "T7032CA5P5SW99999": {
      "variable_name_2": 0,
      "variable_name_3": 14
    }
  }
}

I am unsure how to do this, does this mean my new master dataframe that contains all of them needs to be reindexed (or multi indexed?) i was looking at groupby functions as well but was unsure how to apply them, specifically because of that third nested object. initially before that it was very easy because i was just exporting the to_json with orient=columns and it worked perfectly, but only for the two levels of data.

Upvotes: 1

Views: 73

Answers (1)

rrcal
rrcal

Reputation: 3752

One way you can do it is by loop through the columns and transform to dict as an index:

res = {col: df.pivot('Asset Code','Type', col)\
                              .replace({np.nan:None})\
                              .to_dict(orient='index') for col in 'Column1','Column2']}

Output

{'Column1': {'R0083TX3P3PATX999': {'variable_name_1': 0.0,
   'variable_name_2': None,
   'variable_name_3': None},
  'R0084TX3P3WTXNM99': {'variable_name_1': 55.74,
   'variable_name_2': None,
   'variable_name_3': None},
  'R0087KY2P2KY99999': {'variable_name_1': 265.35,
   'variable_name_2': None,
   'variable_name_3': None},
  'T7001OK2P2OK99999': {'variable_name_1': None,
   'variable_name_2': 0.0,
   'variable_name_3': 0.0},
  'T7029LA3P3SLA9999': {'variable_name_1': None,
   'variable_name_2': 0.0,
   'variable_name_3': 9.0},
  'T7032CA5P5SW99999': {'variable_name_1': None,
   'variable_name_2': 0.0,
   'variable_name_3': 14.0}},
 'Column2': {'R0083TX3P3PATX999': {'variable_name_1': 0.0,
   'variable_name_2': None,
   'variable_name_3': None},
  'R0084TX3P3WTXNM99': {'variable_name_1': 55.74,
   'variable_name_2': None,
   'variable_name_3': None},
  'R0087KY2P2KY99999': {'variable_name_1': 265.35,
   'variable_name_2': None,
   'variable_name_3': None},
  'T7001OK2P2OK99999': {'variable_name_1': None,
   'variable_name_2': 0.0,
   'variable_name_3': 0.0},
  'T7029LA3P3SLA9999': {'variable_name_1': None,
   'variable_name_2': 0.0,
   'variable_name_3': 9.0},
  'T7032CA5P5SW99999': {'variable_name_1': None,
   'variable_name_2': 0.0,
   'variable_name_3': 14.0}}}

Below the code I used to create the df:

df = pd.DataFrame([
    ['R0083TX3P3PATX999',    0.00,    0.00,  'variable_name_1'],
    ['R0084TX3P3WTXNM99',   55.74 ,  55.74,  'variable_name_1'],
    ['R0087KY2P2KY99999',  265.35 , 265.35,  'variable_name_1'],
    ['T7001OK2P2OK99999',    0.00 ,   0.00,  'variable_name_2'],
    ['T7029LA3P3SLA9999',    0.00 ,   0.00,  'variable_name_2'],
    ['T7032CA5P5SW99999',    0.00 ,   0.00,  'variable_name_2'],
    ['T7001OK2P2OK99999',    0.00 ,   0.00,  'variable_name_3'],
    ['T7029LA3P3SLA9999',    9.00 ,   9.00,  'variable_name_3'],
    ['T7032CA5P5SW99999',   14.00 ,  14.00,  'variable_name_3'],
], columns = ['Asset Code','Column1','Column2', 'Type'])

Upvotes: 2

Related Questions