user4069366
user4069366

Reputation:

computing daily return/increment on dataframe

So ive some timeseries data on which i want to compute daily return/increment, where Daily increment = value_at_time(T)/ value_at_time(T-1)

import pandas as pd
df=pd.DataFrame([1,2,3,7]) #Sample data frame
df[1:]
out:
 0
 1  2
 2  3
 3  7
df[:-1]
out: 
    0
 0  1
 1  2
 2  3
######### Method 1
df[1:]/df[:-1]
out:
    0
0 NaN
1   1
2   1
3 NaN

######### Method 2
df[1:]/df[:-1].values
out:
          0
1  2.000000
2  1.500000
3  2.333333

######### Method 3
df[1:].values/df[:-1]
out:
   0
0  2
1  1
2  2

My questions are that

  1. If df[:-1] and df[1:] have only three values (row slices of the dataframe) then why doesnt Method_1 work ?
  2. Why are method 2 & 3 which are almost similar giving different results?
  3. Why using .values in Method_2 makes it work

Upvotes: 2

Views: 963

Answers (1)

EdChum
EdChum

Reputation: 394419

Lets look at each

method 1, if you look at what the slices return you can see that the indices don't align:

In [87]:
print(df[1:])
print(df[:-1])

   0
1  2
2  3
3  7
   0
0  1
1  2
2  3

so then when do the division only 2 columns intersect:

In [88]:
df[1:]/df[:-1]

Out[88]:
     0
0  NaN
1  1.0
2  1.0
3  NaN

Method 2 produces a np array, this has no index so the division will be performed in order element-wise as expected:

In [89]:
df[:-1].values

Out[89]:
array([[1],
       [2],
       [3]], dtype=int64)

Giving:

In [90]:
df[1:]/df[:-1].values

Out[90]:
          0
1  2.000000
2  1.500000
3  2.333333

Method 3 is the same reason as method 2

So the question is how to do this in pure pandas? We use shift to allow you to align the indices as desired:

In [92]:
df.shift(-1)/df

Out[92]:
          0
0  2.000000
1  1.500000
2  2.333333
3       NaN

Upvotes: 1

Related Questions