Reputation: 5569
I have a pandas dataframe df
df
StartDate EndDate Value \
0 2015-03-25 12:25:43.999994 2015-03-25 13:23:43.979992 0
1 2015-03-25 13:23:43.999998 2015-03-25 13:24:43.979998 1
2 2015-03-25 13:24:43.999994 2015-03-25 13:25:43.979995 0
3 2015-03-26 13:25:44.000001 2015-03-26 13:47:43.979996 0
4 2015-03-26 13:47:43.999992 2015-03-26 13:48:43.979993 1
5 2015-03-26 13:48:43.999999 2015-03-26 14:25:43.980001 0
6 2015-03-27 14:25:43.999997 2015-03-27 15:25:43.979998 0
7 2015-03-27 15:25:43.999994 2015-03-27 15:28:43.979997 0
8 2015-03-27 15:28:43.999993 2015-03-27 15:29:43.979994 1
9 2015-03-27 15:29:44.000000 2015-03-27 15:59:43.979997 0
and I would like to compute some operation day by day... Therefore I would like to extract a sub-dataframe containing only the rows belonging to the first day, then the ones related to the second day etc etc..
I was planning to have a for loop and at each iteration select the rows of a particular day...
I calculate the uniques day
unique_days = df['StartDate'].map(lambda t: t.date()).unique()
and then start the loop...
# for each day compute operation
for i in unique_day:
print(i)
df_day = df[df['StartDate'].map(lambda t: t.date()) == i]
df2 = func(df_day,parameters)
Upvotes: 1
Views: 66
Reputation: 862741
I think the best is groupby
by date
s and apply some function like mean
, sum
or apply
with custom function:
df1 = df.groupby(df['StartDate'].dt.date).mean()
df2 = df.groupby(df['StartDate'].dt.date).apply(func)
Sample:
#some sample function
def func(df_day,parameters):
#print each group
print (df_day)
return df_day['StartDate'] - pd.Timedelta(parameters, unit='d')
df2 = df.groupby(df['StartDate'].dt.date).apply(lambda x: func(x, 1))
#less readable
#df2 = df.groupby(df['StartDate'].dt.date).apply(func, 1)
print (df2)
StartDate
2015-03-25 0 2015-03-24 12:25:43.999994
1 2015-03-24 13:23:43.999998
2 2015-03-24 13:24:43.999994
2015-03-26 3 2015-03-25 13:25:44.000001
4 2015-03-25 13:47:43.999992
5 2015-03-25 13:48:43.999999
2015-03-27 6 2015-03-26 14:25:43.999997
7 2015-03-26 15:25:43.999994
8 2015-03-26 15:28:43.999993
9 2015-03-26 15:29:44.000000
Name: StartDate, dtype: datetime64[ns]
Upvotes: 3