Reputation: 1773
Theres a pandas dataframe like shown below
Bank date creationdate
0 JP Morgan 2010-07-22 2010-07-22 12:17:38.187000
1 JP Morgan 2010-07-31 2010-07-31 12:41:57.083000
2 JP Morgan 2010-11-18 2010-11-18 19:24:15.503000
3 JP Morgan 2011-03-08 2011-03-08 18:57:31.477000
4 JP Morgan 2011-04-27 2011-04-27 13:13:01.357000
5 JP Morgan 2011-05-01 2011-05-01 17:19:28.773000
6 JP Morgan 2011-05-06 2011-05-06 19:40:51.757000
7 JP Morgan 2011-05-10 2011-05-10 01:14:52.503000
8 JP Morgan 2011-05-23 2011-05-23 20:36:18.490000
9 JP Morgan 2011-05-25 2011-05-25 15:51:08.650000
10 JP Morgan 2011-05-28 2011-05-28 21:08:30.270000
11 JP Morgan 2011-05-29 2011-05-29 04:18:26.693000
12 JP Morgan 2011-06-03 2011-06-03 16:54:13.770000
13 JP Morgan 2011-06-08 2011-06-08 18:35:50.450000
14 JP Morgan 2011-06-08 2011-06-08 18:37:12.493000
15 JP Morgan 2011-06-08 2011-06-08 18:37:45.593000
I want to find out the mean of the differences of creationdates of each date. For this I am doing a groupby and call diff and then mean on the grouped data
df_grouped = date_df.groupby(['bank', 'date'], as_index = False)
mean = df_grouped['creationdate'].diff().mean()
but this gives me a mean of all differences instead of giving mean for difference corresponding to each date.
Please suggest how can I achieve the mean of differences for each date
Upvotes: 2
Views: 157
Reputation: 54340
I think you can do this with .aggregate
in one step, rather than trying to get things done in two steps:
In [30]:
print df_grouped['creationdate'].aggregate(lambda x: (np.diff(x)).mean())
Bank date creationdate
0 JP Morgan 2010-07-22 NaT
1 JP Morgan 2010-07-31 NaT
2 JP Morgan 2010-11-18 NaT
3 JP Morgan 2011-03-08 NaT
4 JP Morgan 2011-04-27 NaT
5 JP Morgan 2011-05-01 NaT
6 JP Morgan 2011-05-06 NaT
7 JP Morgan 2011-05-10 NaT
8 JP Morgan 2011-05-23 NaT
9 JP Morgan 2011-05-25 NaT
10 JP Morgan 2011-05-28 NaT
11 JP Morgan 2011-05-29 NaT
12 JP Morgan 2011-06-03 NaT
13 JP Morgan 2011-06-08 00:00:57.571500
In the sample data that you showed, only 2011-06-08
has more than one value and will result in a number other than NaT
Upvotes: 3