John M
John M

Reputation: 23

Using pandas or Numpy select a range of rows based on column data

Hello I have multiple csvs of stocks I generated from pandas. My goal seems quite simple, I measure the percentage difference in volume day by day

    for ticker in tickers:
        df = pd.read_csv('stock_dfs/{}.csv'.format(ticker))
        df['Volume_Pct_Change'] = df['Volume'].pct_change().fillna(0)
        df.to_csv('stonks_dfs/{}.csv'.format(ticker))

This was pretty easy. Now I want to find all the times the Volume is above 150%. I did this in another python script different from pulling the original data.

with open('sp500tickers.pickle', "rb") as f:
        tickers = pickle.load(f)

    for ticker in tickers:
        df = pd.read_csv('stock_dfs/{}.csv'.format(ticker))
        df_vpc = df.loc[df['Volume_Pct_Change'] >= 1.5]
        df_vpc.to_csv('{}.csv'.format(ticker))

This works pretty good for me.

My problem is I want to pull a X amount of rows before and after the selected row ie (7 rows before and 30 rows after the selected row that has the Volume_Pct_change >= 1.5) that way I can graph out and see changes over time before and after the volume spike. Bonus for me if there is a way to use Numpy with this.

Edit 1:So running

df = pd.read_csv('AAPL.csv') c=df['Volume_Pct_Change'] >= 1.2 for idx in c: df.iloc[(idx-7):(idx+30)]

gives c the correct output of true when the condition exists

114,False 115,False 116,False 117,False 118,True 119,False 120,False 121,False

But after running df.iloc[(idx-7):(idx+30)] I see no change in df when I export it to a csv.

Edit 2: ok getting somewhere running

df = pd.read_csv('AAPL.csv') c = df.index[df['Volume_Pct_Change'] >= 1.2] for idx in c: df.iloc[(idx-7):(idx+30)]

I see no difference in df when exporting but putting d=df.iloc[(idx-7):(idx+30)] works but only has one set of them in the dataframe which is confusing to me. Its only saving the last one. It seems it loops through overwriting the dataframe with everyone ending with the last one?

Final Edit: Thanks everyone for your help

df = pd.read_csv('AAPL.csv') c = df.index[df['Volume_Pct_Change'] >= 1.2] for idx in c: d = df.iloc[(idx-7):(idx+30)] d.to_csv('{}.csv'.format(idx))

has the desired outcome and is pretty fast.

Upvotes: 1

Views: 129

Answers (2)

moys
moys

Reputation: 8033

Something like this maybe?

c=df.index[df['Volume_Pct_Change'] >= 1.5]
for idx in c:
    df.iloc[(idx-7):(idx+30)]

Upvotes: 1

abhilb
abhilb

Reputation: 5757

Use pandas df iloc to get the list of row indices satisfying the condition of >150%. Then transform the list to a new list with range of -7 to +30 of each value in the list. Use this new list to get the desired result.

Upvotes: 1

Related Questions