gonzo8874
gonzo8874

Reputation: 111

Doing calculations, using mulitple rows, pandas

Expanding on my previous question. Return only the last day of the year with pandas?

my current table is this

name of the table is last_six_years

   symbol   date    close
44  MMM 2014-12-31  139.6180
45  MMM 2015-12-31  131.3644
46  MMM 2016-12-30  159.8925
47  MMM 2017-12-29  215.7010
48  MMM 2018-12-31  179.2494
49  MMM 2019-12-31  171.5581

i need to create a new column(named "capital_gains_yield") which calculates a value, that is derived from using the next row to it with the following formula:

capital_gains_yield = (current year close - previous year close) / previous year close

meaning, the outcome would look something like this:

   symbol   date    close      capital_gains_yield
44  MMM 2014-12-31  139.6180   ...
45  MMM 2015-12-31  131.3644   -0.057
46  MMM 2016-12-30  159.8925   0.2137
47  MMM 2017-12-29  215.7010   0.3522
48  MMM 2018-12-31  179.2494   -0.16
49  MMM 2019-12-31  171.5581   -0.04

How can i do that?

Upvotes: 1

Views: 57

Answers (2)

BiOS
BiOS

Reputation: 2304

Adding to the very good answer from Z Li, if you can't use pct_change(), you may try:

Creating a column with the previous values to compare, ie the values shifted by one.

df['prev_year_close'] = df['close'].shift(1)

Define a function we'll use to get the yield:

def getCGY(row):
    difference = row["close"] - row["prev_year_close"]
    return difference/row["prev_year_close"]

Use pandas' apply() to perform the above function on each row:

df['capital_gains_yield'] = df.apply( lambda x: getCGY(x), axis=1)

Optionally delete the comparison column:

df = df.drop("prev_year_close",axis=1)

Will output:

      close  capital_gains_yield
0  139.6180                  NaN
1  131.3644            -0.059116
2  159.8925             0.217168
3  215.7010             0.349038

This will give you a bit of flexibility in case you need to change the way calculation is made, etc. by editing the getCGY function.

Upvotes: 1

Z Li
Z Li

Reputation: 4318

Check the pandas.DataFrame.pct_change() function.

Assuming you already sorted the dataframe by time:

df = pd.DataFrame({'close':np.random.uniform(0, 10, 5)})
df['yield'] = df['close'].pct_change()



    close       yield
0   9.836278    NaN
1   5.059823    -0.485596
2   0.842129    -0.833566
3   5.670353    5.733358
4   5.906502    0.041646

Upvotes: 3

Related Questions