Guido Muscioni
Guido Muscioni

Reputation: 1295

Pandas - DateTime groupby to structured dict

I have a dataset which contains a DateTime field. I need to group by hours and dispatch each group to a dictionary with the following structure:

{year_1: 
    {month_1: 
        {week_1: 
            {day_1: 
                {hour_1: df_1, hour_2: df_2}
            }
        },
        {week_2: 
            {day_1: 
                {hour_1: df_1}
            }
        }
    },
    {month_3: 
        {week_1: 
            {day_1: 
                {hour_1: df_1, hour_2: df_2}
            }
        }
    },
year_2:
    {month_5: 
        {week_1: 
            {day_1: 
                {hour_2: df_2}
            }
        }
    }
}

To do that I am using the following code:

import pandas as pd

df = df = pd.DataFrame({'date': [pd.datetime(2015,3,17,2),    pd.datetime(2014,3,24,3), pd.datetime(2014,3,17,4)], 'hdg_id': [4041,4041,4041],'stock': [1.0,1.0,1.0]})
df.loc[:,'year'] = [x.year for x in df['date']]
df.loc[:,'month'] = [x.month for x in df['date']]
df.loc[:,'week'] = [x.week for x in df['date']]
df.loc[:,'day'] = [x.day for x in df['date']]
df.loc[:,'hour'] = [x.hour for x in df['date']]

result = {}
for to_unpack, df_hour in df.groupby(['year','month','day','week','hour']):
    year, month, week, day, hour = to_unpack
    try:
        result[year]
    except KeyError:
        result[year] = {}
    try:
        result[year][month]
    except KeyError:
        result[year][month] = {}
    try:
        result[year][month][week]
    except KeyError:
        result[year][month][week] = {}
    try:
        result[year][month][week][day]
    except KeyError:
        result[year][month][week][day] = {}

    result[year][month][week][day][hour] = df_hour

As you can see this is pretty much a brute-force solution and I was looking for something that looks more clean and understandable. Furthermore, it is also extremely slow. I tried different ways for grouping (Python Pandas Group by date using datetime data) and I also tried a multindex with each component of datetime (Pandas DataFrame with MultiIndex: Group by year of DateTime level values). However, the problem is always how to create the dict. Ideally, I would like just to write something like:

result[year][month][week][day][hour] = df_hour

but to the best of my knowledge, I first need to initialize each dict.

Upvotes: 3

Views: 342

Answers (1)

piRSquared
piRSquared

Reputation: 294318

You need dict.setdefault

result = {}
for to_unpack, df_hour in df.groupby(['year','month','day','week','hour']):
    year, month, week, day, hour = to_unpack

    result.setdefault(year, {}) \
          .setdefault(month, {}) \
          .setdefault(week, {}) \
          .setdefault(day, {}) \
          .setdefault(hour, df_hour)

You can also subclass dict to do this

class Fict(dict):
    def __getitem__(self, item):
        return super().setdefault(item, type(self)())

result = Fict()

for to_unpack, df_hour in df.groupby(['year','month','day','week','hour']):
    year, month, week, day, hour = to_unpack

    result[year][month][week][day][hour] = df_hour

Upvotes: 4

Related Questions