Reputation: 395
I'm trying get the averages for the following times in the DataFrame named ave_data (shown below):
Currently my 'ave_data' DataFrame outputs the following data (please ignore all dashes, they are to align data only):
Time ---------------------- F1 ------------- F2 ------------ F3
2082-05-03 00:00:00 --- 43.005593 --- -56.509746 --- 25.271271
2082-05-03 01:00:00 --- 55.114918 --- -59.173852 --- 31.849262
2082-05-03 02:00:00 --- 63.990762 --- -64.699492 --- 52.426017
2082-05-03 03:00:00 --- 56.508333 --- -65.489083 --- 36.188083
2082-05-03 04:00:00 --- 36.217295 --- -59.198033 --- 2.404426
2082-05-03 05:00:00 --- 36.153814 --- -62.156187 --- 24.779830
2082-05-03 06:00:00 --- 93.920334 --- -57.923000 --- 77.654250
...
I would like to save these new averages as a new DataFrame that looks something like (the numbers below are random examples only):
Time -------------------- F1 ------------- F2 ------------ F3
Morning(5AM-9AM) --- 50.005987 --- -60.509746 --- 29.311271
Day(9AM-6PM) -------- 59.005987 --- -49.509746 --- 98.311271
Evening(6PM-10PM) -- 55.018887 --- -47.614622 --- 29.311271
Night (10PM-5AM) ---- 55.018887 --- -47.614622 --- 29.311271
Also, I would ideally like to add a last column that contains the average of each row. The code will be used to read different files that will generate a different number of columns than shown below, so if someone could help me develop a general method for this that would be great.
Below is the relevant section of the code I have:
raw_data = pd.read_excel(r'/Users/linnkloster/Desktop/Results/01_05_2012 Raw Results.xls', skiprows=1, header=0, nrows=1440, dayfirst=True, infer_datetime_format='%d/%m/%Y %H')
raw_data[u'Time']= pd.to_datetime(raw_data['Time'], unit='d')
# Converts first column to datetime, to make averaging easier
# Note this gets the wrong date (2082-05-03) but correct hour
raw_data.set_index(pd.DatetimeIndex(raw_data[u'Time']), inplace=True)
ave_data=raw_data.resample('h', how='mean')
print ave_data
Upvotes: 0
Views: 139
Reputation: 39297
You can apply a function that returns the category:
import pandas as pd
data = [('2082-05-03 00:00:00', 43.005593, -56.509746, 25.271271),
('2082-05-03 01:00:00', 55.114918, -59.173852, 31.849262),
('2082-05-03 02:00:00', 63.990762, -64.699492, 52.426017),
('2082-05-03 03:00:00', 56.508333, -65.489083, 36.188083),
('2082-05-03 04:00:00', 36.217295, -59.198033, 2.404426),
('2082-05-03 05:00:00', 36.153814, -62.156187, 24.779830),
('2082-05-03 06:00:00', 93.920334, -57.923000, 77.654250)]
df = pd.DataFrame(data = data, columns=['Time', 'F1', 'F2', 'F3'])
df.Time = pd.to_datetime(df.Time)
def time_cat(t):
hour = t.hour
if hour < 5:
return 'Night(10PM-5AM)'
if hour < 9:
return 'Morning(5AM-9AM)'
if hour < 18:
return 'Day(9AM-6PM)'
if hour < 22:
return 'Evening(6PM-10PM)'
# if hour >= 22:
return 'Night(10PM-5AM)'
df.groupby(df.Time.apply(time_cat)).mean()
F1 F2 F3
Time
Morning(5AM-9AM) 65.037074 -60.039594 51.217040
Night(10PM-5AM) 50.967380 -61.014041 29.627812
Upvotes: 1
Reputation: 1485
How about this? Note that I added three columns, so I use df
, a copy of raw_data
. If OK, you do not need a copy, of course.
def Time(hour):
if(hour>=5 and hour<9):
return 'Morning'
elif(hour>=9 and hour<18):
return 'Day'
elif(hour>=18 and hour<22):
return 'Evening'
else:
return 'Night'
df = raw_data.copy()
df['date'] = df.Time.apply(lambda time:time.date())
df['hour'] = df.Time.apply(lambda time:time.hour)
df['time']=df.hour.apply(Time)
ave_data=df.drop('hour',axis=1).groupby(['date','time']).mean()
Upvotes: 0