gudvinr
gudvinr

Reputation: 124

Pandas group by chunks not single values

Now I'm kinda confused about grouping stuff using pandas.

I have set of data (over 60k rows) with 3 columns:

2015/12/18 11:12:49 +0300   d1  b1
2015/12/18 11:12:50 +0300   d2  b2
2015/12/18 11:13:08 +0300   d1  b3
2015/12/18 11:13:36 +0300   d2  b4
2015/12/18 11:13:43 +0300   d2  b5
2015/12/18 11:14:21 +0300   d2  c0
2015/12/18 11:14:42 +0300   d2  c1
2015/12/18 11:15:13 +0300   d1  c2
2015/12/18 11:15:19 +0300   d3  c3

And I need to get count of rows grouped by time periods (let's say 0-4, 4-8, 8-12 etc. by 4 hours) and weekdays and then get a single set for periods within a week.

I can get sum for every hour in a week (time is the name of 1st column):

dind = pd.DatetimeIndex(df.time)
gr = df.groupby([dind.weekday, dind.hour])
gr.size()

But I can't figure out how to group by chunks and then merge resulting MultiIndex into single index column.

I hope it was clear description of the problem.

Upvotes: 2

Views: 1349

Answers (1)

piRSquared
piRSquared

Reputation: 294488

The first part of you question, how to group by 4 hour chunks is easy and is addressed in both options below. df.index.hour // 4

The second part was vague as there are several ways to interpret "merge into a single column". I provided you two alternatives.

Option 1

gpd = df.groupby([df.index.weekday, df.index.hour // 4]).size()
gpd.index = gpd.index.to_series()
gpd

(4, 2)    9
dtype: int64

Option 2

gpd = df.groupby([df.index.weekday, df.index.hour // 4]).size()
gpd.index = ['{}_{}'.format(*i) for i in gpd.index]
gpd

4_2    9
dtype: int64

Upvotes: 2

Related Questions