JohnSmith
JohnSmith

Reputation: 55

Iterating through DataFrame rows to create new column while referencing other rows

I have a large dataframe I am working with that contains fudamental data for equities. Below are images of the head and tail of the dataframe (data). It has data for each security and each year from 2005-2015. Note the 'calendardate' column.

My goal is to go to each row, take the 'revenueusd' datapoint and divide it by the 'revenueusd' datapoint for the year before to get the 1 year revenue growth change for each security. The second datapoint is located using the ticker and calendardate.

I have been trying to use the apply function with a lambda but it is not working. Below is the code I have been trying:

def conversion(tick, dates,dataframe):
    date1 = datetime.datetime.strptime(dates, "%Y-%m-%d").date()
    date2 = datetime.date(date1.year-1,date1.month,date1.day).strftime("%Y-%m-%d")
    growth = dataframe[(dataframe['ticker']==tick)&(dataframe['calendardate']==dates)]['revenueusd']/dataframe[(dataframe['ticker']==tick)&(dataframe['calendardate']==date2)]['revenueusd']-1
    return growth

data['1yrRevenueGrowth']=data.apply(lambda x: conversion(x['ticker'],x['calendardate'],data),axis=1)

I have been stuck on this for days and searching the forums relentlessly. Any help would be greatly appreciated!

data.head(5)

data.tail(5)

,ticker,ticker.1,calendardate,revenueusd,gp,rnd  
0,A,A,2015-12-31,4038000000,2041000000,330000000  
1,AA,AA,2015-12-31,22534000000,4465000000,238000000  
2,AAL,AAL,2015-12-31,40990000000,23911000000,0  
3,AAP,AAP,2015-12-31,9737018000,4422772000,0  
4,AAPL,AAPL,2015-12-31,234988000000,94308000000,8576000000  
5,ABBV,ABBV,2015-12-31,22859000000,18359000000,4435000000  
509,A,A,2014-12-31,6981000000,3593000000,719000000  
510,AA,AA,2014-12-31,23906000000,4769000000,218000000  
511,AAPL,AAPL,2014-12-31,199800000000,78432000000,6606000000  
512,ABBV,ABBV,2014-12-31,19960000000,15534000000,3649000000 

Upvotes: 1

Views: 80

Answers (2)

Merlin
Merlin

Reputation: 25629

Starting with this:

 ticker ticker.1 calendardate   revenueusd          gp      rnd  
0      A        A   2015-12-31   4038000000  2041000000  330000000
1     AA       AA   2015-12-31  22534000000  4465000000  238000000
2      A        A   2014-12-31    403800000   204100000  330000000
3     AA       AA   2014-12-31   2253400000   446500000  238000000
4      A        A   2013-12-31    403800000    20410000  330000000
5     AA       AA   2013-12-31    225340000    44650000  238000000
6      A        A   2012-12-31       403800     2041000  330000000
7     AA       AA   2012-12-31     22534000     4465000  238000000


df["pct"] =  df.groupby("ticker")['revenueusd'].pct_change()



 ticker ticker.1 calendardate   revenueusd          gp      rnd      pct
0      A        A   2015-12-31   4038000000  2041000000  330000000    NaN
1     AA       AA   2015-12-31  22534000000  4465000000  238000000    NaN
2      A        A   2014-12-31    403800000   204100000  330000000 -0.900
3     AA       AA   2014-12-31   2253400000   446500000  238000000 -0.900
4      A        A   2013-12-31    403800000    20410000  330000000  0.000
5     AA       AA   2013-12-31    225340000    44650000  238000000 -0.900
6      A        A   2012-12-31       403800     2041000  330000000 -0.999
7     AA       AA   2012-12-31     22534000     4465000  238000000 -0.900

You may need to sort DataFrame prior to applying groupby.

Upvotes: 0

Michael Franzen
Michael Franzen

Reputation: 425

There is a nice function called Series.pct_change for your purpose. You could do it for example like this:

import pandas as pd
data = pd.read_csv("data.csv", index_col=0)
data.groupby("ticker").apply(lambda x : x.set_index("calendardate").sort_index()["revenueusd"].pct_change())

For each ticker value, a series is constructed that is sorted by the calendar date and then the function pct_changeis applied (by default this function computes the ratio between two consecutive entries).

ticker  calendardate
A       2014-12-31           NaN
        2015-12-31     -0.421573
AA      2014-12-31           NaN
        2015-12-31     -0.057391
AAL     2015-12-31           NaN
AAP     2015-12-31           NaN
AAPL    2014-12-31           NaN
        2015-12-31      0.176116
ABBV    2014-12-31           NaN
        2015-12-31      0.145240
Name: revenueusd, dtype: float64

One more thing, your dates are nicely formatted so you can easily convert the column to type datetime like this:

data["calendardate"] = pd.to_datetime(data["calendardate"], infer_datetime_format=True)

Upvotes: 1

Related Questions