user3062260
user3062260

Reputation: 1644

pandas get index values n positions ahead of selection

I have a dataframe with a datetime index. I also have a list of specific dates which I am interested in looking at in my dataframe. I would like to get the rows 'n' positions ahead of my list of specific dates. Say for the example n=5. Here is my code:

import pandas as pd     

# generate an example df
output = pd.DataFrame()    
d = pd.date_range(start='1/1/2000', end='1/1/2006', freq='D')
output['Date'] = d
output['Value'] = 1
output = output[output['Date'].dt.dayofweek < 5].reset_index(drop=True)  # remove weekends
output = output.set_index('Date')

# dates of interest
date_list = pd.to_datetime(['09/05/2002', '15/07/2004', '21/03/2005'], format='%d/%m/%Y')

# i can pull out the dates of interest, but I really want the dates '5' positions ahead
selection = output.iloc[output.index.isin(date_list)]
print(selection)

Please note, '5' positions ahead is not the same as timedelta(days=5)

I know this can be solved by iteration, something like:

for i, row in output.iterrows():
    for i2 in date_list:
        if i == i2:
            print(i,  output.loc[i2:].iloc[5])

But I am looking to do this ideally with a vectorized one liner. Any help would be much appreciated?

Many thanks in advance!

Upvotes: 3

Views: 316

Answers (3)

yatu
yatu

Reputation: 88236

You could use flatnonzero to get the indices, add 5 to them and index:

import numpy as np
output.iloc[np.flatnonzero(output.index[:-5].isin(date_list)) + 5]

             Value
Date             
2002-05-16      1
2004-07-22      1
2005-03-28      1

Or we also have pandas' nonzero:

output.iloc[output.index[:-5].isin(date_list).nonzero()[0]+5]

Value
Date             
2004-07-08      1
2005-03-14      1

Upvotes: 4

Guinther Kovalski
Guinther Kovalski

Reputation: 1909

if it is indexed by date this should work:


mask = pd.DataFrame(output.index.tolist,columns = ['a']).isin(date_list)

output[mask]

if you want to select 5 positions ahead, use

np.argwhere(np.array(mask)==True) +5

Upvotes: 1

Dave
Dave

Reputation: 2049

First get the row positions for your dates of interest, and offset by the amount you want. Then select from your dataframe using the offset row positions:

output.iloc[np.flatnonzero(ouput.index.isin(date_list)) + 5]

Upvotes: 1

Related Questions