Reputation: 607
let's consider the following DataFrame:
d = {'timestamp': ['2019-04-01', '2019-04-01', '2019-04-02', '2019-04-02', '2019-04-02'],\
'type': ['A', 'B', 'B', 'B', 'A'],\
'value': [3, 4, 4, 2, 5]}
df = pd.DataFrame(data=d)
timestamp type value
0 2019-04-01 A 3
1 2019-04-01 B 4
2 2019-04-02 B 4
3 2019-04-02 B 2
4 2019-04-02 A 5
What I would like to obtain is another column containing a metric of all the values within a particular time period and type. For instance, the standard deviation per type per day.
Upvotes: 1
Views: 41
Reputation: 863216
Use GroupBy.std
:
df = df.groupby(['timestamp','type'])['value'].std().reset_index()
print (df)
timestamp type value
0 2019-04-01 A NaN
1 2019-04-01 B NaN
2 2019-04-02 A NaN
3 2019-04-02 B 1.414214
If need multiple metrics is possible use DataFrameGroupBy.describe
:
df = df.groupby(['timestamp','type'])['value'].describe()
print (df)
count mean std min 25% 50% 75% max
timestamp type
2019-04-01 A 1.0 3.0 NaN 3.0 3.0 3.0 3.0 3.0
B 1.0 4.0 NaN 4.0 4.0 4.0 4.0 4.0
2019-04-02 A 1.0 5.0 NaN 5.0 5.0 5.0 5.0 5.0
B 2.0 3.0 1.414214 2.0 2.5 3.0 3.5 4.0
More information about aggregation is in Aggregation in pandas.
EDIT: If need months only use Series.dt.month
:
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.groupby([df['timestamp'].dt.month.rename('months'),'type'])['value'].describe()
print (df)
count mean std min 25% 50% 75% max
months type
4 A 2.0 4.000000 1.414214 3.0 3.5 4.0 4.5 5.0
B 3.0 3.333333 1.154701 2.0 3.0 4.0 4.0 4.0
If need years with months use Series.dt.to_period
for month period:
m = df['timestamp'].dt.to_period('m').rename('months')
df = df.groupby([m,'type'])['value'].describe()
print (df)
count mean std min 25% 50% 75% max
months type
2019-04 A 2.0 4.000000 1.414214 3.0 3.5 4.0 4.5 5.0
B 3.0 3.333333 1.154701 2.0 3.0 4.0 4.0 4.0
Upvotes: 1