Reputation: 2632
I am still learning my way around python, and this is somewhat a complicated question,
having a table pandas.DataFrame
like this:
SAMPLE_TIME TempBottom TempTop TempOut State Bypass
0 2015-07-15 16:41:56 48.625 55.812 43.875 1 1
1 2015-07-15 16:42:55 48.750 55.812 43.875 1 1
2 2015-07-15 16:43:55 48.937 55.812 43.875 1 1
3 2015-07-15 16:44:56 49.125 55.812 43.812 1 1
4 2015-07-15 16:45:55 49.312 55.812 43.812 1 1
this is a big data set with entries every minutes over few weeks. I am trying to get the range for every day, so basically ignoring the time and splitting by days
I forgot to mention that this was imported from csv using pd.read_csv()
which i think means that SMAPLE_TIME
is not a DatetimeIndex
Upvotes: 2
Views: 101
Reputation: 862451
You can try:
#set to datetimeindex
df['SAMPLE_TIME'] = pd.to_datetime(df['SAMPLE_TIME'])
print df
SAMPLE_TIME TempBottom TempTop TempOut State Bypass
0 2015-07-05 16:41:56 48.625 55.812 43.875 1 1
1 2015-07-05 16:42:55 48.750 55.812 43.875 1 1
2 2015-07-23 16:43:55 48.937 55.812 43.875 1 1
3 2015-07-23 16:44:56 49.125 55.812 43.812 1 1
4 2015-07-25 16:45:55 49.312 55.812 43.812 1 1
df = df.set_index('SAMPLE_TIME')
g1 = df.groupby(lambda x: x.day)
for d,g in g1:
print d
print g
5
TempBottom TempTop TempOut State Bypass
SAMPLE_TIME
2015-07-05 16:41:56 48.625 55.812 43.875 1 1
2015-07-05 16:42:55 48.750 55.812 43.875 1 1
23
TempBottom TempTop TempOut State Bypass
SAMPLE_TIME
2015-07-23 16:43:55 48.937 55.812 43.875 1 1
2015-07-23 16:44:56 49.125 55.812 43.812 1 1
25
TempBottom TempTop TempOut State Bypass
SAMPLE_TIME
2015-07-25 16:45:55 49.312 55.812 43.812 1 1
Or you can group by day and aggregate by sum:
df = df.set_index('SAMPLE_TIME')
g1 = df.groupby(lambda x: x.day).agg(sum)
print g1
TempBottom TempTop TempOut State Bypass
5 97.375 111.624 87.750 2 2
23 98.062 111.624 87.687 2 2
25 49.312 55.812 43.812 1 1
Or group by year, month and day and aggregate by sum:
df['SAMPLE_TIME'] = pd.to_datetime(df['SAMPLE_TIME'])
df = df.set_index('SAMPLE_TIME')
g1 = df.groupby([lambda x: x.year, lambda x: x.month, lambda x: x.day]).agg(sum)
print g1
TempBottom TempTop TempOut State Bypass
2015 7 5 97.375 111.624 87.750 2 2
23 98.062 111.624 87.687 2 2
25 49.312 55.812 43.812 1 1
Upvotes: 2
Reputation: 42885
You can
df['SAMPLE_TIME'] = pd.to_datetime(df['SAMPLE_TIME'])
df.set_index('SAMPLE_TIME', inplace=True)
df_by_days = df.groupby(pd.TimeGrouper('D')).agg()
to apply various aggregation functions as described in the docs. If you provide some detail on what you'd like to aggregate and how, happy to add an example.
Upvotes: 2