Reputation: 8521
I am analysing a stock market data and I was able to get only the open, high, low, close, and volume. Now I wanted to calculate the percentage increase for each day using Pandas. The following is my dataframe:
>>> df.head()
date open high low close volume
0.0 Aug 18, 2016 1,250.00 1,294.85 1,250.00 1,293.25 1,312,905
1.0 Aug 17, 2016 1,240.00 1,275.00 1,235.05 1,243.85 1,704,985
2.0 Aug 16, 2016 1,297.00 1,297.95 1,206.65 1,237.10 3,054,180
3.0 Aug 12, 2016 1,406.25 1,406.25 1,176.75 1,276.40 8,882,899
4.0 Aug 11, 2016 1,511.85 1,584.50 1,475.00 1,580.00 1,610,322
Then I needed the previous days close, so I used the shift
method and is as follows:
>>> df['pre_close'] = df['close'].shift(-1)
>>> df.head()
date open high low close volume \
0.0 Aug 18, 2016 1,250.00 1,294.85 1,250.00 1,293.25 1,312,905
1.0 Aug 17, 2016 1,240.00 1,275.00 1,235.05 1,243.85 1,704,985
2.0 Aug 16, 2016 1,297.00 1,297.95 1,206.65 1,237.10 3,054,180
3.0 Aug 12, 2016 1,406.25 1,406.25 1,176.75 1,276.40 8,882,899
4.0 Aug 11, 2016 1,511.85 1,584.50 1,475.00 1,580.00 1,610,322
pre_close
0.0 1,243.85
1.0 1,237.10
2.0 1,276.40
3.0 1,580.00
4.0 1,510.05
Now I wanted to calculate the percentage increase for each day, but all my data was in string so I replaced the commas
with ''
and is as follows:
>>> df.dtypes
date object
open object
high object
low object
close object
volume object
tomm_close object
dtype: object
>>> df = df.replace({',': ''}, regex=True)
Now my main problem starts, I wanted to do the following arithmetic operation:
% increase = (New Number - Original Number) ÷ Original Number × 100.
And to do arithmetic operations we need to to have float data type and I wrote a code which converts the data type and calculate the profit, and is as follows:
>>> df['per']=((df['close'].astype(float) \
.sub(df['pre_close'].astype(float), axis=0)) \
.div(df['close'].astype(float),axis=0)) \
.mul(float(100))
>>> df.head()
date open high low close volume pre_close \
0.0 Aug 18 2016 1250.00 1294.85 1250.00 1293.25 1312905 1243.85
1.0 Aug 17 2016 1240.00 1275.00 1235.05 1243.85 1704985 1237.10
2.0 Aug 16 2016 1297.00 1297.95 1206.65 1237.10 3054180 1276.40
3.0 Aug 12 2016 1406.25 1406.25 1176.75 1276.40 8882899 1580.00
4.0 Aug 11 2016 1511.85 1584.50 1475.00 1580.00 1610322 1510.05
per
0.0 3.819834
1.0 0.542670
2.0 -3.176784
3.0 -23.785647
4.0 4.427215
My code is working correctly, but my doubt is is there any better way than this? Am I doing the type conversion correctly and is that the correct way of using multiple arithmetic operations for a single operation? Thanks for the help.
Upvotes: 0
Views: 1826
Reputation: 795
If you want to do any kind of arithmetic operations you can do it easily by this way:
for col in df.columns:
if i == 'date':
pass
else:
df[col] = pd.to_numeric(df[col], errors = 'coerce')
and then arithmetic operations can be done easily just by using arithmetic operators, in your case
df['per'] = ((df['close'] - df['pre_close'])/df['close']) * 100
Upvotes: 0
Reputation: 214957
There is a pct_change()
function for calculating the percent change between current day and previous day, which you can use (note the NA
here is due to the fact that I have access to only five rows of your data):
df['per'] = (df.close.replace({',':''}, regex=True).astype(float)
.pct_change().shift(-1) * 100)
Upvotes: 5