helloB
helloB

Reputation: 3582

Best way to groupby in a Pandas DataFrame and get back another DataFrame with desirable column names?

I'm new to Pandas (from R and data.table) and trying to learn best practices. I'd like to convert the following into as few steps as possible:

avg_data = dt[dt['datatype'] == 0].groupby(
    [pd.TimeGrouper(key = 'time', freq = 'D')).agg('mean')
a = pd.DataFrame(moment_count)
a.columns = ['value']
a['time'] = a.index

In particular, I'd like to do a groupby and then end up with a DataFrame that maintains a time column rather than turning that into an index and which names the aggregate results a name I choose. The end result should be this:

enter image description here

from this:

enter image description here

What's the most efficient way to do this?

Upvotes: 0

Views: 62

Answers (1)

piRSquared
piRSquared

Reputation: 294218

Since you didn't provide reproducible code, I created my own example.

The main point is to use resample (documentation) with a 'D' (for daily) frequency type. Then follow that up with an aggregation function. I used sum.

Setup: copy paste this to set up problem

import pandas as pd
import numpy as np

times = pd.date_range('2016-05-10', '2016-05-15', freq='H')

np.random.seed([3, 1415])
data = np.random.rand(len(times), 5)

df = pd.DataFrame(data * 5, index=times, columns=list('ABCDE'), dtype=np.int64)

Solution

df.resample('D').sum()

Looks like:

             A   B   C   D   E
2016-05-10  47  54  44  53  46
2016-05-11  49  46  37  47  45
2016-05-12  43  41  43  35  36
2016-05-13  51  43  39  54  44
2016-05-14  59  40  49  51  50
2016-05-15   2   0   3   4   3

mean looks like:

df.resample('D').mean().round(2)

               A     B     C     D     E
2016-05-10  1.96  2.25  1.83  2.21  1.92
2016-05-11  2.04  1.92  1.54  1.96  1.88
2016-05-12  1.79  1.71  1.79  1.46  1.50
2016-05-13  2.12  1.79  1.62  2.25  1.83
2016-05-14  2.46  1.67  2.04  2.12  2.08
2016-05-15  2.00  0.00  3.00  4.00  3.00

Upvotes: 1

Related Questions