Fudgster
Fudgster

Reputation: 43

Percentage change on a time series in pandas

I am new to python and learning it by doing some basic stock data analysis. Below is the dataframe I am using

                      date      open      high  ...       close  volume 
0      2010-01-05 09:16:00   5282.00   5283.10  ...   5281.10  94700    NaN
1      2010-01-05 12:16:00   5281.60   5281.60  ...   5278.30  49100    NaN
2      2010-01-05 16:16:00   5278.50   5280.50  ...   5278.80  62550    NaN
3      2010-01-06 09:16:00   5278.80   5279.45  ...   5277.30  64850    NaN
4      2010-01-06 12:16:00   5277.95   5278.00  ...   5276.00  65251    NaN

As you can see its a timeseries where there are different timeslots within a day. So I want to find the prtc_change (Percentage change) open of 2010-01-06 09:16:00 as compared to the close of 2010-01-05 16:16:00. how would I calculate it?

This is the kind of output I am looking for:

                      date      open      high  ...       close  volume %change
0      2010-01-05 09:16:00   5282.00   5283.10  ...   5281.10  94700    
1      2010-01-05 12:16:00   5281.60   5281.60  ...   5278.30  49100    
2      2010-01-05 16:16:00   5278.50   5280.50  ...   5278.80*  62550    
3      2010-01-06 09:16:00   5278.80*   5279.45  ...   5277.30  64850    0
4      2010-01-06 12:16:00   5277.95   5278.00  ...   5276.00  65251    

The %change column has 0 for 2010-01-05-close to 2010-01-05 9:16-open because the open = close (5278.80 == 5278.80) (marked by *).

Note : I have manipulated the data a little bit as I was working on it . below are the codes

import pandas as pd
import datetime

df = pd.read_csv(r'C:\Users\Admin\Desktop\Python files\nifty.txt' , sep = ';' , names = ["dates","open","high","low","close","volume"])
## fomration the date and time
df['dates'] = pd.to_datetime(df['dates'].astype(str) , format='%Y%m%d %H%M%S' )
## splitting the datetime column into date and time
df['date'] = [d.date() for d in df['dates']]
df['time'] = [d.time() for d in df['dates']]

the current dataframe looks like :

                     dates      open      high  ...  volume        date      time
0      2010-01-05 09:16:00   5282.00   5283.10  ...   94700  2010-01-05  09:16:00
1      2010-01-05 12:16:00   5281.60   5281.60  ...   49100  2010-01-05  12:16:00
2      2010-01-05 16:16:00   5278.50   5280.50  ...   62550  2010-01-05  16:16:00
3      2010-01-06 09:16:00   5278.80   5279.45  ...   64850  2010-01-05  09:16:00
4      2010-01-06 12:16:00   5277.95   5278.00  ...   65251  2010-01-05  12:16:00

Upvotes: 3

Views: 5464

Answers (2)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Use:

df['dates'] = pd.to_datetime(df['dates'])
close = df['close'].shift()
df['% change'] = np.where(
    df['date'].dt.day.diff().gt(0), ((df['open'] - close) / close) * 100, '')

Result:

# print(df)
                 date     open     high   close  volume % change
0 2010-01-05 09:16:00  5282.00  5283.10  5281.1   94700         
1 2010-01-05 12:16:00  5281.60  5281.60  5278.3   49100         
2 2010-01-05 16:16:00  5278.50  5280.50  5278.8   62550         
3 2010-01-06 09:16:00  5278.80  5279.45  5277.3   64850      0.0
4 2010-01-06 12:16:00  5277.95  5278.00  5276.0   65251         

Upvotes: 3

Valdi_Bo
Valdi_Bo

Reputation: 30971

Pandas has pct_change function, but it computes the percent change between consecutive elements of a source Series, or for each column of numeric type in a source DataFrame.

So in your case it is useless, and you need a different approach:

  1. The first step is to find the first open and last close on each day:

     days = df.groupby(df.date.dt.date).agg({'open': 'first', 'close': 'last'})
    
  2. Then, calculate the percent change:

     100 * (days.open - days.close.shift()) / days.open
    

Details:

  • days.open - the earliest open from the current day.
  • days.close.shift() - the latest close from the previous day.
  • 100 * ... - to express the result as the number of percent.

The second step is to "join" these data with the original DataFrame (create a new column):

  1. Define a function computing %change column for a group of rows for particular day:

     def pctChg(grp):
         rv = pd.Series('', index=grp.index)
         chg = days.pct.asof(grp.iloc[0, 0])
         if pd.notnull(chg): rv.iloc[0] = chg
         return rv
    
  2. Then create the new column:

     df['%change'] = df.groupby(df.date.dt.date)\
         .apply(pctChg).reset_index(level=0, drop=True)
    

Upvotes: 4

Related Questions