Reputation: 707
I have one dataframe like below:
ticker fy fp value f_date rn
MSFT 2009 0 144 2010-01-01T12:12:34 0
AAPL 2010 0 144 2010-01-01T12:12:34 0
MSFT 2009 0 48 2014-05-01T12:12:34 1
AAPL 2011 0 80 2012-01-01T12:12:34 1
GOOG 2010 0 40 2010-01-01T12:12:34 0
I just want to groupby this data on the basis ticker
, fy
, fp
just like below
df.groupby(by=['ticker', 'fy', 'fp'])
On the basis of this, i just want to calculate the difference of max
and min
of f_date
and divide it by max of rn
. For example, group MSFT, 2009, 0
, max date is 2014-05-01T12:12:34
and min date is 2010-01-01T12:12:34
, and the max rn
is 1, so i want to calculate it as max(f_date) - min(f_date)/ max(rn+1)
. so i'll get the days inbetween of these two dates, hence i can map this data with other to do some analysis
i'm unable to move forward after groupby.
Upvotes: 1
Views: 1728
Reputation: 862601
For pandas 0.25+ is possible use named aggregations, then subtract and divide columns:
df['f_date'] = pd.to_datetime(df['f_date'])
df = df.groupby(by=['ticker', 'fy', 'fp']).agg(min1=('f_date','min'),
max1=('f_date','max'),
rn=('rn', 'max'))
df['new'] = df['max1'].sub(df['min1']).div(df['rn'].add(1))
print (df)
min1 max1 rn new
ticker fy fp
AAPL 2010 0 2010-01-01 12:12:34 2010-01-01 12:12:34 0 0 days 00:00:00
2011 0 2012-01-01 12:12:34 2012-01-01 12:12:34 1 0 days 00:00:00
GOOG 2010 0 2010-01-01 12:12:34 2010-01-01 12:12:34 0 0 days 00:00:00
MSFT 2009 0 2010-01-01 12:12:34 2014-05-01 12:12:34 1 790 days 12:00:00
Or if necessary convert difference of datetimes (timedeltas) to seconds by Series.dt.total_seconds
:
df['new1'] = df['max1'].sub(df['min1']).dt.total_seconds().div(df['rn'].add(1))
print (df)
min1 max1 rn new
ticker fy fp
AAPL 2010 0 2010-01-01 12:12:34 2010-01-01 12:12:34 0 0.0
2011 0 2012-01-01 12:12:34 2012-01-01 12:12:34 1 0.0
GOOG 2010 0 2010-01-01 12:12:34 2010-01-01 12:12:34 0 0.0
MSFT 2009 0 2010-01-01 12:12:34 2014-05-01 12:12:34 1 68299200.0
Solution for oldier pandas versions:
df['f_date'] = pd.to_datetime(df['f_date'])
df = df.groupby(by=['ticker', 'fy', 'fp']).agg({'f_date':['min','max'],
'rn':'max'})
df.columns = df.columns.map('_'.join)
df['new'] = df['f_date_max'].sub(df['f_date_min']).div(df['rn_max'].add(1))
print (df)
f_date_min f_date_max rn_max \
ticker fy fp
AAPL 2010 0 2010-01-01 12:12:34 2010-01-01 12:12:34 0
2011 0 2012-01-01 12:12:34 2012-01-01 12:12:34 1
GOOG 2010 0 2010-01-01 12:12:34 2010-01-01 12:12:34 0
MSFT 2009 0 2010-01-01 12:12:34 2014-05-01 12:12:34 1
new
ticker fy fp
AAPL 2010 0 0 days 00:00:00
2011 0 0 days 00:00:00
GOOG 2010 0 0 days 00:00:00
MSFT 2009 0 790 days 12:00:00
Last if necessary convert MultiIndex
to columns:
df = df.reset_index()
print (df)
ticker fy fp f_date_min f_date_max rn_max \
0 AAPL 2010 0 2010-01-01 12:12:34 2010-01-01 12:12:34 0
1 AAPL 2011 0 2012-01-01 12:12:34 2012-01-01 12:12:34 1
2 GOOG 2010 0 2010-01-01 12:12:34 2010-01-01 12:12:34 0
3 MSFT 2009 0 2010-01-01 12:12:34 2014-05-01 12:12:34 1
new
0 0 days 00:00:00
1 0 days 00:00:00
2 0 days 00:00:00
3 790 days 12:00:00
Upvotes: 3