Reputation: 43
I am new to python and learning it by doing some basic stock data analysis. Below is the dataframe I am using
date open high ... close volume
0 2010-01-05 09:16:00 5282.00 5283.10 ... 5281.10 94700 NaN
1 2010-01-05 12:16:00 5281.60 5281.60 ... 5278.30 49100 NaN
2 2010-01-05 16:16:00 5278.50 5280.50 ... 5278.80 62550 NaN
3 2010-01-06 09:16:00 5278.80 5279.45 ... 5277.30 64850 NaN
4 2010-01-06 12:16:00 5277.95 5278.00 ... 5276.00 65251 NaN
As you can see its a timeseries where there are different timeslots within a day. So I want to find the prtc_change (Percentage change) open of 2010-01-06 09:16:00
as compared to the close of 2010-01-05 16:16:00
.
how would I calculate it?
This is the kind of output I am looking for:
date open high ... close volume %change
0 2010-01-05 09:16:00 5282.00 5283.10 ... 5281.10 94700
1 2010-01-05 12:16:00 5281.60 5281.60 ... 5278.30 49100
2 2010-01-05 16:16:00 5278.50 5280.50 ... 5278.80* 62550
3 2010-01-06 09:16:00 5278.80* 5279.45 ... 5277.30 64850 0
4 2010-01-06 12:16:00 5277.95 5278.00 ... 5276.00 65251
The %change
column has 0 for 2010-01-05
-close to 2010-01-05 9:16
-open because the open = close (5278.80 == 5278.80) (marked by *).
Note : I have manipulated the data a little bit as I was working on it . below are the codes
import pandas as pd
import datetime
df = pd.read_csv(r'C:\Users\Admin\Desktop\Python files\nifty.txt' , sep = ';' , names = ["dates","open","high","low","close","volume"])
## fomration the date and time
df['dates'] = pd.to_datetime(df['dates'].astype(str) , format='%Y%m%d %H%M%S' )
## splitting the datetime column into date and time
df['date'] = [d.date() for d in df['dates']]
df['time'] = [d.time() for d in df['dates']]
the current dataframe looks like :
dates open high ... volume date time
0 2010-01-05 09:16:00 5282.00 5283.10 ... 94700 2010-01-05 09:16:00
1 2010-01-05 12:16:00 5281.60 5281.60 ... 49100 2010-01-05 12:16:00
2 2010-01-05 16:16:00 5278.50 5280.50 ... 62550 2010-01-05 16:16:00
3 2010-01-06 09:16:00 5278.80 5279.45 ... 64850 2010-01-05 09:16:00
4 2010-01-06 12:16:00 5277.95 5278.00 ... 65251 2010-01-05 12:16:00
Upvotes: 3
Views: 5464
Reputation: 71689
Use:
df['dates'] = pd.to_datetime(df['dates'])
close = df['close'].shift()
df['% change'] = np.where(
df['date'].dt.day.diff().gt(0), ((df['open'] - close) / close) * 100, '')
Result:
# print(df)
date open high close volume % change
0 2010-01-05 09:16:00 5282.00 5283.10 5281.1 94700
1 2010-01-05 12:16:00 5281.60 5281.60 5278.3 49100
2 2010-01-05 16:16:00 5278.50 5280.50 5278.8 62550
3 2010-01-06 09:16:00 5278.80 5279.45 5277.3 64850 0.0
4 2010-01-06 12:16:00 5277.95 5278.00 5276.0 65251
Upvotes: 3
Reputation: 30971
Pandas has pct_change function, but it computes the percent change between consecutive elements of a source Series, or for each column of numeric type in a source DataFrame.
So in your case it is useless, and you need a different approach:
The first step is to find the first open and last close on each day:
days = df.groupby(df.date.dt.date).agg({'open': 'first', 'close': 'last'})
Then, calculate the percent change:
100 * (days.open - days.close.shift()) / days.open
Details:
days.open
- the earliest open from the current day.days.close.shift()
- the latest close from the previous day.100 * ...
- to express the result as the number of percent.The second step is to "join" these data with the original DataFrame (create a new column):
Define a function computing %change column for a group of rows for particular day:
def pctChg(grp):
rv = pd.Series('', index=grp.index)
chg = days.pct.asof(grp.iloc[0, 0])
if pd.notnull(chg): rv.iloc[0] = chg
return rv
Then create the new column:
df['%change'] = df.groupby(df.date.dt.date)\
.apply(pctChg).reset_index(level=0, drop=True)
Upvotes: 4