Reputation: 103
I used the output = result.set_index('times').groupby(pd.TimeGrouper('H')).mean()
and then the output.between_time('11:00', '12:00')
to get only the two hours that I want. What I'm having a hard time figuring out is if there is a way to select the 12:00:00 but if there isn't, the 11:00:00 is the one used instead if possible. Here's what the output snippet looks like:
times A B C D
1996-12-04 11:00:00 NaN 0.081826 0.112259 0.134100
1996-12-04 12:00:00 NaN 0.080180 0.107108 0.126118
....
2017-01-15 11:00:00 0.246591 0.309864 0.332677 0.362805
2017-01-15 12:00:00 0.242433 0.301287 0.325492 0.355687
2017-01-16 11:00:00 0.131201 0.155804 0.170489 0.180293
2017-01-16 12:00:00 NaN NaN NaN NaN NaN NaN
2017-01-17 11:00:00 0.078308 0.093946 0.104750 0.110965
2017-01-17 12:00:00 0.083883 0.097341 0.108580 0.114755
2017-01-18 11:00:00 NaN NaN NaN NaN NaN NaN
2017-01-18 12:00:00 NaN NaN NaN NaN NaN NaN
2017-01-19 11:00:00 0.092868 0.109789 0.120100 0.125462
2017-01-19 12:00:00 0.098974 0.113243 0.125806 0.130909
And the solution I'm looking for:
times A B C D
1996-12-04 12:00:00 NaN 0.080180 0.107108 0.126118
2017-01-15 12:00:00 0.242433 0.301287 0.325492 0.355687
2017-01-16 11:00:00 0.131201 0.155804 0.170489 0.180293
2017-01-17 12:00:00 0.083883 0.097341 0.108580 0.114755
2017-01-19 12:00:00 0.098974 0.113243 0.125806 0.130909
I'm assuming I need to use a for loop with an if statement, but I'm just starting to learn Python so I haven't gotten the hang of it yet.
Upvotes: 1
Views: 576
Reputation: 502
If you want to select according to a value that is in your dataframe, you can use
df.reset_index().set_index('times').loc['12:00:00']
Main drawback : The given value has to be in the index.
index A B C D
times
12:00:00 2017-01-15 0.242433 0.301287 0.325492 0.355687
12:00:00 2017-01-16 NaN NaN NaN NaN
12:00:00 2017-01-17 0.083883 0.097341 0.108580 0.114755
12:00:00 2017-01-18 NaN NaN NaN NaN
12:00:00 2017-01-19 0.098974 0.113243 0.125806 0.130909
If you want to give a time interval you can do the same :
df.reset_index().set_index('times').loc['12:00:00':'13:00:00']
Upvotes: 0
Reputation: 862921
First if necessary remove all NaNs rows by dropna
.
Then groupby
by DatetimeIndex.date
and aggregate last
.
Methods reset_index
with set_index
are necessary, because after groupby dates lost times.
df = df.dropna(how='all', axis=0)
df = df.reset_index().groupby(df.index.date).last().set_index('times')
print (df)
A B C D
times
1996-12-04 12:00:00 NaN 0.080180 0.107108 0.126118
2017-01-15 12:00:00 0.242433 0.301287 0.325492 0.355687
2017-01-16 11:00:00 0.131201 0.155804 0.170489 0.180293
2017-01-17 12:00:00 0.083883 0.097341 0.108580 0.114755
2017-01-19 12:00:00 0.098974 0.113243 0.125806 0.130909
Upvotes: 1
Reputation: 5119
The thing you are looking for is to have the first of each group. Imagine you would add a column day
and a column hour
, then you could sort the values, group them by date, and take the last one of each group (12 if available otherwise 11).
result = df.sort_values('hour').groupby('day').apply(lambda g: g[-1]).reset_index()
You can fill these columns before or calculate them on the fly, e.g.
df.groupby(df['date'].apply(lambda x: x.date))
Upvotes: 0