Reputation: 3582
I'm new to Pandas (from R and data.table) and trying to learn best practices. I'd like to convert the following into as few steps as possible:
avg_data = dt[dt['datatype'] == 0].groupby(
[pd.TimeGrouper(key = 'time', freq = 'D')).agg('mean')
a = pd.DataFrame(moment_count)
a.columns = ['value']
a['time'] = a.index
In particular, I'd like to do a groupby and then end up with a DataFrame
that maintains a time
column rather than turning that into an index and which names the aggregate results a name I choose. The end result should be this:
from this:
What's the most efficient way to do this?
Upvotes: 0
Views: 62
Reputation: 294218
Since you didn't provide reproducible code, I created my own example.
The main point is to use resample
(documentation) with a 'D'
(for daily) frequency type. Then follow that up with an aggregation function. I used sum
.
import pandas as pd
import numpy as np
times = pd.date_range('2016-05-10', '2016-05-15', freq='H')
np.random.seed([3, 1415])
data = np.random.rand(len(times), 5)
df = pd.DataFrame(data * 5, index=times, columns=list('ABCDE'), dtype=np.int64)
df.resample('D').sum()
Looks like:
A B C D E
2016-05-10 47 54 44 53 46
2016-05-11 49 46 37 47 45
2016-05-12 43 41 43 35 36
2016-05-13 51 43 39 54 44
2016-05-14 59 40 49 51 50
2016-05-15 2 0 3 4 3
mean
looks like:
df.resample('D').mean().round(2)
A B C D E
2016-05-10 1.96 2.25 1.83 2.21 1.92
2016-05-11 2.04 1.92 1.54 1.96 1.88
2016-05-12 1.79 1.71 1.79 1.46 1.50
2016-05-13 2.12 1.79 1.62 2.25 1.83
2016-05-14 2.46 1.67 2.04 2.12 2.08
2016-05-15 2.00 0.00 3.00 4.00 3.00
Upvotes: 1