Advanced Slicing of Intervals in Pandas Dataframe

Question

I need to slice several intervals out of one dataframe indexed with Freq: 120T. The start date of each of the desired intervals is given by a second dataframe indexed with Freq: None. The idea is that I need to take each of those start dates and include a time and # of periods to build each interval. The time attribute and the periods are the same for all intervals.

Let's look at this mess with an example

Say our start time is '18:00:00' and our # of periods is 3. The dataframe to be sliced is df1 and the dataframe that contains the start dates is df2.

df1

                      A   B     
DateTime                                               
2005-09-06 16:00:00   1   5  
2005-09-06 18:00:00   2   6  
2005-09-06 20:00:00   3   7  
2005-09-06 22:00:00   4   8 
2005-12-07 16:00:00   9   8  
2005-12-07 18:00:00   7   6  
2005-12-07 20:00:00   5   4  
2005-12-07 22:00:00   3   2  


[2005-09-06 16:00:00, ..., 2005-12-07 22:00:00]
Length: 8, Freq: 120T, Timezone: None

df2

             Num
DateTime                                                                    
2005-09-07     1
2005-12-07     2


[2005-09-07, 2005-12-07]
Length: 2, Freq: None, Timezone: None

Desired Output:

df3 = func(source=df1['B'], start_dates=df2.index, time_start='18:00:00', periods=3)

           1   2
18:00:00   6   6  
20:00:00   7   4  
22:00:00   8   2

What have I done and considerations:

One of the difficulties is that the data in df1 is in 120T frequency but is only business days. Taking this into account, I would do something like this:

start = df2.index[0]   ##  And somehow add to this formula the fact that we want to start at         
                           '18:00'
df3 = df1['B'][(df1.index > start) & (df1.index < start + 3)]  ##  Somehow iterate this over the 
                                                                   dates in the df2 index

I appreciate any insight

Thanks in advance

Andy Hayden · Accepted Answer

A completely different approach:

def next_n_asof(x, t, n):
    """The next n rows after time t in x
    """
    i = np.argmax(df.index >= t)
    return x[i:i + n]

In [11]: next_n_asof(df.B, pd.Timestamp('2005-09-06 18:00:00'), 3)
Out[11]:
2005-09-06 18:00:00    6
2005-09-06 20:00:00    7
2005-09-06 22:00:00    8
Name: B, dtype: int64

We can use this in a concat for each day in the index:

In [12]: pd.concat(next_n_asof(df.B, t, 3)
                   for t in df2.index + pd.tseries.timedeltas.to_timedelta(18, unit='h'))
Out[12]:
2005-09-06 18:00:00    6
2005-09-06 20:00:00    7
2005-09-06 22:00:00    8
2005-12-07 18:00:00    6
2005-12-07 20:00:00    4
2005-12-07 22:00:00    2
Name: B, dtype: int64

we had to add the time to the dates in df2.index:

In [13]: df2.index + pd.tseries.timedeltas.to_timedelta(18, unit='h')
Out[13]:

[2005-09-06 18:00:00, 2005-12-07 18:00:00]
Length: 2, Freq: None, Timezone: None

Note: I couldn't get this to work cleanly with asof itself... which may be more efficient.

Advanced Slicing of Intervals in Pandas Dataframe

Answers (2)

Related Questions