akyayik
akyayik

Reputation: 664

Creating a hierarchical dictionary from DataFrame column

I have a dataframe like this;

+------------+------------+-------+
|    Date    | Total Cars | Lanes |  
+------------+------------+-------+
| 2019-10-20 |          5 |     2 |  
| 2019-10-23 |         15 |     3 |
| 2020-01-20 |         23 |     2 |
+------------+------------+-------+

I want to return a dictionary like this;

{
    "y2019":{
        "Year Total Cars":20,
        "Year Total Lanes":5,
        "M10":{
            "Month All Cars":20,
            "Month All Lanes":5,
            "Day20":{
                "Day All Cars":5,
                "Day All Lanes":2
            },
            "Day 23":{
                "Day All Cars":15,
                "Day All Lanes":3
            }
        }
    },
    "y2020":{
        "Year Total Cars":23,
        "Year Total Lanes":2,
        "M10":{
            "Month Total Cars":23,
            "Month Total Lanes":2,
            "Day20":{
                "Day All Cars":23,
                "Day All Lanes":2
            }
        }
    }
}

So far Tried to use df.resample and tried to create a nested dict out of this but wasn't successful. Are there any other elegant ways to tackle this in Pandas?

I tried to unpack dates into Y-M-D and added these as into the dataframe. Then I created a groupby object to iterate through.

date_counter_df.loc[:,'year'] = [x.year for x in date_counter_df.index]
date_counter_df.loc[:,'month'] = [x.month for x in date_counter_df.index]
date_counter_df.loc[:,'day'] = [x.day for x in date_counter_df.index]

result = {}
for to_unpack, df_res in date_counter_df.groupby(['year','month','day']):
    year, month, day = to_unpack
    try:
        result[year]
    except KeyError:
        result[year] = {}
    try:
        result[year][month]
    except KeyError:
        result[year][month] = {}
    try:
        result[year][month][day]
    except KeyError:
        result[year][month][day] = {}

    result[year][month][day] = df_res

Result looks like this from the original dataset;

{2019: {10: {17:               sensor-id Totalt i retning Kristiansand Totalt i retning Oslo  \
   Date                                                                          
   2019-10-17  11219V22151                             0                     0   

                  1     2  3  4 Totalt Totalt i retning Fianex Rv 415  \
   Date                                                                 
   2019-10-17  2702  2615  0  0   5317                           2614   

              Totalt i retning Stølen X Rv 420  year  month  day  
   Date                                                           
   2019-10-17                             2703  2019     10   17  ,
   30:               sensor-id Totalt i retning Kristiansand Totalt i retning Oslo  \
   Date                                                                          
   2019-10-30  11219V22151                             0                     0   

                  1     2  3  4 Totalt Totalt i retning Fianex Rv 415  \
   Date                                                                 
   2019-10-30  2729  2589  0  0   5318                           2589   

              Totalt i retning Stølen X Rv 420  year  month  day  
   Date                                                           
   2019-10-30                             2729  2019     10   30  },
  12: {28:                 sensor-id Totalt i retning Kristiansand Totalt i retning Oslo  \
   Date                                                                            
   2019-12-28  61942V2809673                          3134                  3461   

                 1    2     3     4 Totalt Totalt i retning Fianex Rv 415  \
   Date                                                                     
   2019-12-28  333  494  2801  2967   6595                              0   

              Totalt i retning Stølen X Rv 420  year  month  day  
   Date                                                           
   2019-12-28                                0  2019     12   28  }},
 2020: {2: {19:                 sensor-id Totalt i retning Kristiansand Totalt i retning Oslo  \
   Date                                                                            
   2020-02-19  71445V2809674                          5006                  5202   

                 1    2     3     4 Totalt Totalt i retning Fianex Rv 415  \
   Date                                                                     
   2020-02-19  686  747  4320  4455  10208                              0   

              Totalt i retning Stølen X Rv 420  year  month  day  
   Date                                                           
   2020-02-19                                0  2020      2   19  }}}

Now I only will need to add the totals for both Years and Months as dict items.

Upvotes: 0

Views: 214

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 30971

Start from conversion of Date column to Datetime (if till now it is of string type):

df.Date = pd.to_datetime(df.Date)

Then define 3 functions:

  1. A function to process each row, from a monthly group:

    def rowFn(row):
        return { 'Day All Cars': row['Total Cars'], 'Day All Lanes': row.Lanes }
    
  2. A function to process each monthly group:

    def monthGrpFun(grp):
        dct = { 'Month All Cars': grp['Total Cars'].sum(),
                'Month All Lanes': grp.Lanes.sum() }
        for _, row in grp.iterrows():
            dct[f'Day {row.Date.day:02}'] = rowFn(row)
        return dct
    
  3. A function to process each yearly group:

    def yearGrpFun(grp):
        dct = { 'Year Total Cars': grp['Total Cars'].sum(),
                'Year Total Lanes': grp.Lanes.sum() }
        for key, grp2 in grp.groupby(grp.Date.dt.month):
            dct[f'M{key:02}'] = monthGrpFun(grp2)
        return dct
    

And to get the result, run:

dct = {}
for key, grp in df.groupby(df.Date.dt.year):
    dct[f'y{key}'] = yearGrpFun(grp)

The result for your data (reformatted for readability) is:

{
  'y2019': {
    'Year Total Cars': 20,
    'Year Total Lanes': 5,
    'M10': {
      'Month All Cars': 20,
      'Month All Lanes': 5,
      'Day 20': {'Day All Cars': 5, 'Day All Lanes': 2},
      'Day 23': {'Day All Cars': 15, 'Day All Lanes': 3}
    }
  },
  'y2020': {
    'Year Total Cars': 23,
    'Year Total Lanes': 2,
    'M01': {
      'Month All Cars': 23,
      'Month All Lanes': 2,
      'Day 20': {'Day All Cars': 23, 'Day All Lanes': 2}
    }
  }
}

Upvotes: 2

Related Questions