Python Pandas - multiple arithmetic operation in a single dataframe

Question

I am analysing a stock market data and I was able to get only the open, high, low, close, and volume. Now I wanted to calculate the percentage increase for each day using Pandas. The following is my dataframe:

>>> df.head()
             date      open      high       low     close     volume
0.0  Aug 18, 2016  1,250.00  1,294.85  1,250.00  1,293.25  1,312,905
1.0  Aug 17, 2016  1,240.00  1,275.00  1,235.05  1,243.85  1,704,985
2.0  Aug 16, 2016  1,297.00  1,297.95  1,206.65  1,237.10  3,054,180
3.0  Aug 12, 2016  1,406.25  1,406.25  1,176.75  1,276.40  8,882,899
4.0  Aug 11, 2016  1,511.85  1,584.50  1,475.00  1,580.00  1,610,322

Then I needed the previous days close, so I used the shift method and is as follows:

>>> df['pre_close'] = df['close'].shift(-1)
>>> df.head()
             date      open      high       low     close     volume  \
0.0  Aug 18, 2016  1,250.00  1,294.85  1,250.00  1,293.25  1,312,905   
1.0  Aug 17, 2016  1,240.00  1,275.00  1,235.05  1,243.85  1,704,985   
2.0  Aug 16, 2016  1,297.00  1,297.95  1,206.65  1,237.10  3,054,180   
3.0  Aug 12, 2016  1,406.25  1,406.25  1,176.75  1,276.40  8,882,899   
4.0  Aug 11, 2016  1,511.85  1,584.50  1,475.00  1,580.00  1,610,322   

    pre_close  
0.0   1,243.85  
1.0   1,237.10  
2.0   1,276.40  
3.0   1,580.00  
4.0   1,510.05

Now I wanted to calculate the percentage increase for each day, but all my data was in string so I replaced the commas with '' and is as follows:

>>> df.dtypes
date          object
open          object
high          object
low           object
close         object
volume        object
tomm_close    object
dtype: object

>>> df = df.replace({',': ''}, regex=True)

Now my main problem starts, I wanted to do the following arithmetic operation:

% increase = (New Number - Original Number) ÷ Original Number × 100.

And to do arithmetic operations we need to to have float data type and I wrote a code which converts the data type and calculate the profit, and is as follows:

>>> df['per']=((df['close'].astype(float) \
            .sub(df['pre_close'].astype(float), axis=0)) \
            .div(df['close'].astype(float),axis=0)) \
            .mul(float(100))

>>> df.head()
            date     open     high      low    close   volume pre_close  \
0.0  Aug 18 2016  1250.00  1294.85  1250.00  1293.25  1312905    1243.85   
1.0  Aug 17 2016  1240.00  1275.00  1235.05  1243.85  1704985    1237.10   
2.0  Aug 16 2016  1297.00  1297.95  1206.65  1237.10  3054180    1276.40   
3.0  Aug 12 2016  1406.25  1406.25  1176.75  1276.40  8882899    1580.00   
4.0  Aug 11 2016  1511.85  1584.50  1475.00  1580.00  1610322    1510.05   

           per  
0.0   3.819834  
1.0   0.542670  
2.0  -3.176784  
3.0 -23.785647  
4.0   4.427215

My code is working correctly, but my doubt is is there any better way than this? Am I doing the type conversion correctly and is that the correct way of using multiple arithmetic operations for a single operation? Thanks for the help.

akuiper · Accepted Answer

There is a pct_change() function for calculating the percent change between current day and previous day, which you can use (note the NA here is due to the fact that I have access to only five rows of your data):

df['per'] = (df.close.replace({',':''}, regex=True).astype(float)
               .pct_change().shift(-1) * 100)

Python Pandas - multiple arithmetic operation in a single dataframe

Answers (2)

Related Questions