Reputation: 23
Hello I have multiple csvs of stocks I generated from pandas. My goal seems quite simple, I measure the percentage difference in volume day by day
for ticker in tickers:
df = pd.read_csv('stock_dfs/{}.csv'.format(ticker))
df['Volume_Pct_Change'] = df['Volume'].pct_change().fillna(0)
df.to_csv('stonks_dfs/{}.csv'.format(ticker))
This was pretty easy. Now I want to find all the times the Volume is above 150%. I did this in another python script different from pulling the original data.
with open('sp500tickers.pickle', "rb") as f:
tickers = pickle.load(f)
for ticker in tickers:
df = pd.read_csv('stock_dfs/{}.csv'.format(ticker))
df_vpc = df.loc[df['Volume_Pct_Change'] >= 1.5]
df_vpc.to_csv('{}.csv'.format(ticker))
This works pretty good for me.
My problem is I want to pull a X amount of rows before and after the selected row ie (7 rows before and 30 rows after the selected row that has the Volume_Pct_change >= 1.5) that way I can graph out and see changes over time before and after the volume spike. Bonus for me if there is a way to use Numpy with this.
Edit 1:So running
df = pd.read_csv('AAPL.csv')
c=df['Volume_Pct_Change'] >= 1.2
for idx in c:
df.iloc[(idx-7):(idx+30)]
gives c the correct output of true when the condition exists
114,False
115,False
116,False
117,False
118,True
119,False
120,False
121,False
But after running df.iloc[(idx-7):(idx+30)]
I see no change in df when I export it to a csv.
Edit 2: ok getting somewhere running
df = pd.read_csv('AAPL.csv')
c = df.index[df['Volume_Pct_Change'] >= 1.2]
for idx in c:
df.iloc[(idx-7):(idx+30)]
I see no difference in df when exporting but putting d=df.iloc[(idx-7):(idx+30)]
works but only has one set of them in the dataframe which is confusing to me. Its only saving the last one. It seems it loops through overwriting the dataframe with everyone ending with the last one?
Final Edit: Thanks everyone for your help
df = pd.read_csv('AAPL.csv')
c = df.index[df['Volume_Pct_Change'] >= 1.2]
for idx in c:
d = df.iloc[(idx-7):(idx+30)]
d.to_csv('{}.csv'.format(idx))
has the desired outcome and is pretty fast.
Upvotes: 1
Views: 129
Reputation: 8033
Something like this maybe?
c=df.index[df['Volume_Pct_Change'] >= 1.5]
for idx in c:
df.iloc[(idx-7):(idx+30)]
Upvotes: 1
Reputation: 5757
Use pandas df iloc to get the list of row indices satisfying the condition of >150%. Then transform the list to a new list with range of -7 to +30 of each value in the list. Use this new list to get the desired result.
Upvotes: 1