Reputation: 3
So, I have a panda dataframe which has a column with the data and time and another column with a float value. now I want to take an average of the values in the second column if the date is the same irrespective of the time and want to display just one value for that date. can anyone help with that?
E.g-
**1st row**- Column A - '2018-12-08 04:51:58', Column B- '3', Column C- '4'
**2nd row**- Column A - '2018-12-08 04:51:49', Column B- '2', Column C- '3'
**3rd row**- Column A - '2018-12-08 04:51:57', Column B- '1', Column C- '2'
The output that I want is :
Column A- '2018-12-08', Column B- '2', Column C- '3'
P.S the dtype
for column A is datetime64[ns]
, and i have over 3k rows with different datetime and values
Can anyone help with that?
Upvotes: 0
Views: 60
Reputation: 75100
EDIT As you say your dataframe is datetime64[ns]
, something like:
date C
0 2018-09-14 00:00:00 4
1 2018-09-14 00:10:00 3
2 2018-09-14 01:10:00 2`
use:
df.groupby(df['date'].dt.date).mean().reset_index()
or :
df.groupby(df['date'].dt.floor('d')).mean().reset_index()
Output:
date C
0 2018-09-14 3
For getting the sum and dividing by 2:
df.groupby(df['date'].dt.date).apply(lambda x : x['C'].sum()/2).reset_index().rename(columns={0:'new'})
or (one more method which i just found)
df.groupby(df['date'].dt.floor('d')).apply(lambda x : x['C'].sum()/2).reset_index().rename(columns={0:'new'})
Output:
date new
0 2018-09-14 4.5
Upvotes: 1