kingmakerking
kingmakerking

Reputation: 2089

Pandas groupby hour of the day to dictionary

I have a pandas data frame as follows:

date                | Item   | count
------------------------------------
2016-12-06 10:45:08 |  Item1 |  60
2016-12-06 10:45:08 |  Item2 |  145
2016-12-06 09:45:00 |  Item1 |  60
2016-12-06 09:44:54 |  Item3 |  600
2016-12-06 09:44:48 |  Item4 |  15
2016-12-06 11:45:08 |  Item1 |  60
2016-12-06 10:45:08 |  Item2 |  14
2016-11-06 09:45:00 |  Item1 |  62
2016-11-06 09:44:54 |  Item3 |  6
2016-11-06 09:44:48 |  Item4 |  15

I am trying to groupby the Items by let's say hour of the day (or later just day) to know the following statistics: list of items sold per day, such as:

While I am far away from fetching these statistics, I am stuck with grouping by time. Initially, the print df.dtypes showed

date    object
Item    object
count   int64
dtype: object

So, I used the following line of code to convert the date column to a pandas date object.

df['date'] = pd.to_datetime(df['date'])

and now, print df.dtypes yields:

date    datetime64[ns]
Item    object
count   int64
dtype: object

However, when I try to groupby the date column using TimeGrouper by executing the following lines of code

from pandas.tseries.resample import TimeGrouper 
print df.groupby([df['date'],pd.TimeGrouper(freq='Min')])

I get the following TypeError. As per the suggestions given here or here, converting using pd.to_datetime should have resolved this issue.

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'

I have no clue how to solve this issue to proceed to the statistics I am looking for. Any tips on resolving this error and using TimeGrouper to seek the statistics preferably in a dictionary format (or anything that makes more sense) would be much appreciated.

Upvotes: 5

Views: 871

Answers (2)

jezrael
jezrael

Reputation: 863246

You can use groupby by numpy array - datetimes with removed minutes and seconds:

print (df['date'].values.astype('<M8[h]'))
['2016-12-06T10' '2016-12-06T10' '2016-12-06T09' '2016-12-06T09'
 '2016-12-06T09' '2016-12-06T11' '2016-12-06T10' '2016-11-06T09'
 '2016-11-06T09' '2016-11-06T09']

print (df.groupby(df['date'].values.astype('<M8[h]')).Item.unique())
2016-11-06 09:00:00    [Item1, Item3, Item4]
2016-12-06 09:00:00    [Item1, Item3, Item4]
2016-12-06 10:00:00           [Item1, Item2]
2016-12-06 11:00:00                  [Item1]
Name: Item, dtype: object

print (df.groupby(df['date'].values.astype('<M8[h]')).Item
         .apply(lambda x: x.unique().tolist()).to_dict())
{Timestamp('2016-11-06 09:00:00'): ['Item1', 'Item3', 'Item4'], 
 Timestamp('2016-12-06 09:00:00'): ['Item1', 'Item3', 'Item4'], 
 Timestamp('2016-12-06 10:00:00'): ['Item1', 'Item2'], 
 Timestamp('2016-12-06 11:00:00'): ['Item1']}

print (df.groupby(df['date'].values.astype('<M8[D]')).Item
         .apply(lambda x: x.unique().tolist()).to_dict())
{Timestamp('2016-11-06 00:00:00'): ['Item1', 'Item3', 'Item4'], 
 Timestamp('2016-12-06 00:00:00'): ['Item1', 'Item2', 'Item3', 'Item4']}

Thank you Jeff for suggestion use round:

print (df.groupby(df['date'].dt.round('h')).Item
         .apply(lambda x: x.unique().tolist()).to_dict())

{Timestamp('2016-11-06 10:00:00'): ['Item1', 'Item3', 'Item4'], 
 Timestamp('2016-12-06 12:00:00'): ['Item1'], 
 Timestamp('2016-12-06 10:00:00'): ['Item1', 'Item3', 'Item4'], 
 Timestamp('2016-12-06 11:00:00'): ['Item1', 'Item2']}

print (df.groupby(df['date'].dt.round('d')).Item
         .apply(lambda x: x.unique().tolist()).to_dict())
{Timestamp('2016-11-06 00:00:00'): ['Item1', 'Item3', 'Item4'], 
 Timestamp('2016-12-06 00:00:00'): ['Item1', 'Item2', 'Item3', 'Item4']}

Upvotes: 3

piRSquared
piRSquared

Reputation: 294488

sold = df.set_index('date').Item.resample('H').agg({'Sold': 'unique'})
sold[sold.Sold.str.len() > 0]

                                      Sold
date                                      
2016-11-06 09:00:00  [Item4, Item3, Item1]
2016-12-06 09:00:00  [Item4, Item3, Item1]
2016-12-06 10:00:00         [Item1, Item2]
2016-12-06 11:00:00                [Item1]

Upvotes: 3

Related Questions