Reputation: 63
In Pandas I have dataset like this:
Value
2005-08-03 23:15:00 10.5
2005-08-03 23:30:00 10.0
2005-08-03 23:45:00 10.0
2005-08-04 00:00:00 10.5
2005-08-04 00:15:00 10.5
2005-08-04 00:30:00 11.0
2005-08-04 00:45:00 10.5
2005-08-04 01:00:00 11.0
...
2005-08-04 23:15:00 14.0
2005-08-04 23:30:00 13.5
2005-08-04 23:45:00 13.0
2005-08-05 00:00:00 13.5
2005-08-05 00:15:00 14.0
2005-08-05 00:30:00 14.0
2005-08-05 00:45:00 14.5
First I wanted to group data by date and store each group's max value in new column, I used the following code for this task:
df['ValueMaxInGroup'] = df.groupby(pd.TimeGrouper('D'))['Value'].transform(max)
Now I want to create another column to store previous group max value, so the desired data frame would look like:
Value ValueMaxInGroup ValueMaxInPrevGroup
2005-08-03 23:15:00 10.5 10.5 NaN
2005-08-03 23:30:00 10.0 10.5 NaN
2005-08-03 23:45:00 10.0 10.5 NaN
2005-08-04 00:00:00 10.5 14.0 10.5
2005-08-04 00:15:00 10.5 14.0 10.5
2005-08-04 00:30:00 11.0 14.0 10.5
2005-08-04 00:45:00 10.5 14.0 10.5
2005-08-04 01:00:00 11.0 14.0 10.5
...
2005-08-04 23:15:00 14.0 14.0 10.5
2005-08-04 23:30:00 13.5 14.0 10.5
2005-08-04 23:45:00 13.0 14.0 10.5
2005-08-05 00:00:00 13.5 14.5 14.0
2005-08-05 00:15:00 14.0 14.5 14.0
2005-08-05 00:30:00 14.0 14.5 14.0
2005-08-05 00:45:00 14.5 14.5 14.0
So, to simply get previous row's value, I used
df['ValueInPrevRow'] = df.shift(1)['Value']
Is there any way to get another group's min/max/f(x)? I assumed
df['ValueMaxInPrevGroup'] = df.groupby(pd.TimeGrouper('D')).shift(1)['Value'].transform(max)
but it didn't work.
Upvotes: 1
Views: 890
Reputation: 879591
You could get the desired result by using groupby/agg
, shift
and merge
:
import numpy as np
import pandas as pd
df = pd.DataFrame({'Value': [10.5, 10.0, 10.0, 10.5, 10.5, 11.0, 10.5, 11.0, 14.0, 13.5, 13.0, 13.5, 14.0, 14.0, 14.5]}, index=['2005-08-03 23:15:00', '2005-08-03 23:30:00', '2005-08-03 23:45:00', '2005-08-04 00:00:00', '2005-08-04 00:15:00', '2005-08-04 00:30:00', '2005-08-04 00:45:00', '2005-08-04 01:00:00', '2005-08-04 23:15:00', '2005-08-04 23:30:00', '2005-08-04 23:45:00', '2005-08-05 00:00:00', '2005-08-05 00:15:00', '2005-08-05 00:30:00', '2005-08-05 00:45:00'])
df.index = pd.DatetimeIndex(df.index)
# This is equivalent to
# df['group'] = pd.to_datetime(df.index.date)
# when freq='D', but the version below works with any freq string, not just `'D'`.
grouped = df.groupby(pd.TimeGrouper('D'))
labels, uniqs, ngroups = grouped.grouper.group_info
df['group'] = grouped.grouper.binlabels[labels]
result = grouped[['Value']].agg(max)
result = result.rename(columns={'Value':'Max'})
result['PreviouMax'] = result['Max'].shift(1)
df = pd.merge(df, result, left_on=['group'], right_index=True)
print(df)
yields
Value group Max PreviouMax
2005-08-03 23:15:00 10.5 2005-08-03 10.5 NaN
2005-08-03 23:30:00 10.0 2005-08-03 10.5 NaN
2005-08-03 23:45:00 10.0 2005-08-03 10.5 NaN
2005-08-04 00:00:00 10.5 2005-08-04 14.0 10.5
2005-08-04 00:15:00 10.5 2005-08-04 14.0 10.5
2005-08-04 00:30:00 11.0 2005-08-04 14.0 10.5
2005-08-04 00:45:00 10.5 2005-08-04 14.0 10.5
2005-08-04 01:00:00 11.0 2005-08-04 14.0 10.5
2005-08-04 23:15:00 14.0 2005-08-04 14.0 10.5
2005-08-04 23:30:00 13.5 2005-08-04 14.0 10.5
2005-08-04 23:45:00 13.0 2005-08-04 14.0 10.5
2005-08-05 00:00:00 13.5 2005-08-05 14.5 14.0
2005-08-05 00:15:00 14.0 2005-08-05 14.5 14.0
2005-08-05 00:30:00 14.0 2005-08-05 14.5 14.0
2005-08-05 00:45:00 14.5 2005-08-05 14.5 14.0
The main idea here is to use groupby/agg
instead of groupby/transform
so that we may obtain
result = grouped[['Value']].agg(max)
result = result.rename(columns={'Value':'Max'})
result['PreviouMax'] = result['Max'].shift(1)
# Max PreviouMax
# group
# 2005-08-03 10.5 NaN
# 2005-08-04 14.0 10.5
# 2005-08-05 14.5 14.0
Then the desired DataFrame can be expressed as the result of merging df
with
result
on the group
date.
Upvotes: 2