Reputation: 55
I have a large dataframe I am working with that contains fudamental data for equities. Below are images of the head and tail of the dataframe (data). It has data for each security and each year from 2005-2015. Note the 'calendardate' column.
My goal is to go to each row, take the 'revenueusd' datapoint and divide it by the 'revenueusd' datapoint for the year before to get the 1 year revenue growth change for each security. The second datapoint is located using the ticker and calendardate.
I have been trying to use the apply function with a lambda but it is not working. Below is the code I have been trying:
def conversion(tick, dates,dataframe):
date1 = datetime.datetime.strptime(dates, "%Y-%m-%d").date()
date2 = datetime.date(date1.year-1,date1.month,date1.day).strftime("%Y-%m-%d")
growth = dataframe[(dataframe['ticker']==tick)&(dataframe['calendardate']==dates)]['revenueusd']/dataframe[(dataframe['ticker']==tick)&(dataframe['calendardate']==date2)]['revenueusd']-1
return growth
data['1yrRevenueGrowth']=data.apply(lambda x: conversion(x['ticker'],x['calendardate'],data),axis=1)
I have been stuck on this for days and searching the forums relentlessly. Any help would be greatly appreciated!
,ticker,ticker.1,calendardate,revenueusd,gp,rnd
0,A,A,2015-12-31,4038000000,2041000000,330000000
1,AA,AA,2015-12-31,22534000000,4465000000,238000000
2,AAL,AAL,2015-12-31,40990000000,23911000000,0
3,AAP,AAP,2015-12-31,9737018000,4422772000,0
4,AAPL,AAPL,2015-12-31,234988000000,94308000000,8576000000
5,ABBV,ABBV,2015-12-31,22859000000,18359000000,4435000000
509,A,A,2014-12-31,6981000000,3593000000,719000000
510,AA,AA,2014-12-31,23906000000,4769000000,218000000
511,AAPL,AAPL,2014-12-31,199800000000,78432000000,6606000000
512,ABBV,ABBV,2014-12-31,19960000000,15534000000,3649000000
Upvotes: 1
Views: 80
Reputation: 25629
Starting with this:
ticker ticker.1 calendardate revenueusd gp rnd
0 A A 2015-12-31 4038000000 2041000000 330000000
1 AA AA 2015-12-31 22534000000 4465000000 238000000
2 A A 2014-12-31 403800000 204100000 330000000
3 AA AA 2014-12-31 2253400000 446500000 238000000
4 A A 2013-12-31 403800000 20410000 330000000
5 AA AA 2013-12-31 225340000 44650000 238000000
6 A A 2012-12-31 403800 2041000 330000000
7 AA AA 2012-12-31 22534000 4465000 238000000
df["pct"] = df.groupby("ticker")['revenueusd'].pct_change()
ticker ticker.1 calendardate revenueusd gp rnd pct
0 A A 2015-12-31 4038000000 2041000000 330000000 NaN
1 AA AA 2015-12-31 22534000000 4465000000 238000000 NaN
2 A A 2014-12-31 403800000 204100000 330000000 -0.900
3 AA AA 2014-12-31 2253400000 446500000 238000000 -0.900
4 A A 2013-12-31 403800000 20410000 330000000 0.000
5 AA AA 2013-12-31 225340000 44650000 238000000 -0.900
6 A A 2012-12-31 403800 2041000 330000000 -0.999
7 AA AA 2012-12-31 22534000 4465000 238000000 -0.900
You may need to sort DataFrame prior to applying groupby.
Upvotes: 0
Reputation: 425
There is a nice function called Series.pct_change for your purpose. You could do it for example like this:
import pandas as pd
data = pd.read_csv("data.csv", index_col=0)
data.groupby("ticker").apply(lambda x : x.set_index("calendardate").sort_index()["revenueusd"].pct_change())
For each ticker value, a series is constructed that is sorted by the calendar date and then the function pct_change
is applied (by default this function computes the ratio between two consecutive entries).
ticker calendardate
A 2014-12-31 NaN
2015-12-31 -0.421573
AA 2014-12-31 NaN
2015-12-31 -0.057391
AAL 2015-12-31 NaN
AAP 2015-12-31 NaN
AAPL 2014-12-31 NaN
2015-12-31 0.176116
ABBV 2014-12-31 NaN
2015-12-31 0.145240
Name: revenueusd, dtype: float64
One more thing, your dates are nicely formatted so you can easily convert the column to type datetime like this:
data["calendardate"] = pd.to_datetime(data["calendardate"], infer_datetime_format=True)
Upvotes: 1