Select between two times in Python

I used the output = result.set_index('times').groupby(pd.TimeGrouper('H')).mean() and then the output.between_time('11:00', '12:00') to get only the two hours that I want. What I'm having a hard time figuring out is if there is a way to select the 12:00:00 but if there isn't, the 11:00:00 is the one used instead if possible. Here's what the output snippet looks like:

times                   A           B           C           D
1996-12-04 11:00:00     NaN     0.081826    0.112259    0.134100    
1996-12-04 12:00:00     NaN     0.080180    0.107108    0.126118    
....
2017-01-15 11:00:00     0.246591    0.309864    0.332677    0.362805    
2017-01-15 12:00:00     0.242433    0.301287    0.325492    0.355687    
2017-01-16 11:00:00     0.131201    0.155804    0.170489    0.180293    
2017-01-16 12:00:00     NaN     NaN     NaN     NaN     NaN     NaN     
2017-01-17 11:00:00     0.078308    0.093946    0.104750    0.110965    
2017-01-17 12:00:00     0.083883    0.097341    0.108580    0.114755    
2017-01-18 11:00:00     NaN     NaN     NaN     NaN     NaN     NaN     
2017-01-18 12:00:00     NaN     NaN     NaN     NaN     NaN     NaN     
2017-01-19 11:00:00     0.092868    0.109789    0.120100    0.125462    
2017-01-19 12:00:00     0.098974    0.113243    0.125806    0.130909

And the solution I'm looking for:

times                   A           B           C           D
1996-12-04 12:00:00     NaN     0.080180    0.107108    0.126118    
2017-01-15 12:00:00     0.242433    0.301287    0.325492    0.355687    
2017-01-16 11:00:00     0.131201    0.155804    0.170489    0.180293    
2017-01-17 12:00:00     0.083883    0.097341    0.108580    0.114755    
2017-01-19 12:00:00     0.098974    0.113243    0.125806    0.130909

I'm assuming I need to use a for loop with an if statement, but I'm just starting to learn Python so I haven't gotten the hang of it yet.

Upvotes: 1

Answers (3)

Thomas Grsp

Reputation: 502

If you want to select according to a value that is in your dataframe, you can use

    df.reset_index().set_index('times').loc['12:00:00']

Main drawback : The given value has to be in the index.

Output:

            index       A           B           C           D
times                   
12:00:00    2017-01-15  0.242433    0.301287    0.325492    0.355687
12:00:00    2017-01-16  NaN         NaN         NaN         NaN
12:00:00    2017-01-17  0.083883    0.097341    0.108580    0.114755
12:00:00    2017-01-18  NaN         NaN         NaN         NaN
12:00:00    2017-01-19  0.098974    0.113243    0.125806    0.130909

If you want to give a time interval you can do the same :

df.reset_index().set_index('times').loc['12:00:00':'13:00:00']

Upvotes: 0

jezrael

Reputation: 862921

First if necessary remove all NaNs rows by dropna.

Then groupby by DatetimeIndex.date and aggregate last.

Methods reset_index with set_index are necessary, because after groupby dates lost times.

df = df.dropna(how='all', axis=0)
df = df.reset_index().groupby(df.index.date).last().set_index('times')
print (df)
                            A         B         C         D
times                                                      
1996-12-04 12:00:00       NaN  0.080180  0.107108  0.126118
2017-01-15 12:00:00  0.242433  0.301287  0.325492  0.355687
2017-01-16 11:00:00  0.131201  0.155804  0.170489  0.180293
2017-01-17 12:00:00  0.083883  0.097341  0.108580  0.114755
2017-01-19 12:00:00  0.098974  0.113243  0.125806  0.130909

Upvotes: 1

Roelant

Reputation: 5119

The thing you are looking for is to have the first of each group. Imagine you would add a column day and a column hour, then you could sort the values, group them by date, and take the last one of each group (12 if available otherwise 11).

result = df.sort_values('hour').groupby('day').apply(lambda g: g[-1]).reset_index()

You can fill these columns before or calculate them on the fly, e.g.

df.groupby(df['date'].apply(lambda x: x.date))

Upvotes: 0

Select between two times in Python

Answers (3)

Output:

Related Questions