Iterating through DataFrame rows to create new column while referencing other rows

Question

I have a large dataframe I am working with that contains fudamental data for equities. Below are images of the head and tail of the dataframe (data). It has data for each security and each year from 2005-2015. Note the 'calendardate' column.

My goal is to go to each row, take the 'revenueusd' datapoint and divide it by the 'revenueusd' datapoint for the year before to get the 1 year revenue growth change for each security. The second datapoint is located using the ticker and calendardate.

I have been trying to use the apply function with a lambda but it is not working. Below is the code I have been trying:

def conversion(tick, dates,dataframe):
    date1 = datetime.datetime.strptime(dates, "%Y-%m-%d").date()
    date2 = datetime.date(date1.year-1,date1.month,date1.day).strftime("%Y-%m-%d")
    growth = dataframe[(dataframe['ticker']==tick)&(dataframe['calendardate']==dates)]['revenueusd']/dataframe[(dataframe['ticker']==tick)&(dataframe['calendardate']==date2)]['revenueusd']-1
    return growth

data['1yrRevenueGrowth']=data.apply(lambda x: conversion(x['ticker'],x['calendardate'],data),axis=1)

I have been stuck on this for days and searching the forums relentlessly. Any help would be greatly appreciated!

data.head(5)

data.tail(5)

,ticker,ticker.1,calendardate,revenueusd,gp,rnd  
0,A,A,2015-12-31,4038000000,2041000000,330000000  
1,AA,AA,2015-12-31,22534000000,4465000000,238000000  
2,AAL,AAL,2015-12-31,40990000000,23911000000,0  
3,AAP,AAP,2015-12-31,9737018000,4422772000,0  
4,AAPL,AAPL,2015-12-31,234988000000,94308000000,8576000000  
5,ABBV,ABBV,2015-12-31,22859000000,18359000000,4435000000  
509,A,A,2014-12-31,6981000000,3593000000,719000000  
510,AA,AA,2014-12-31,23906000000,4769000000,218000000  
511,AAPL,AAPL,2014-12-31,199800000000,78432000000,6606000000  
512,ABBV,ABBV,2014-12-31,19960000000,15534000000,3649000000

Michael Franzen · Accepted Answer

There is a nice function called Series.pct_change for your purpose. You could do it for example like this:

import pandas as pd
data = pd.read_csv("data.csv", index_col=0)
data.groupby("ticker").apply(lambda x : x.set_index("calendardate").sort_index()["revenueusd"].pct_change())

For each ticker value, a series is constructed that is sorted by the calendar date and then the function pct_changeis applied (by default this function computes the ratio between two consecutive entries).

ticker  calendardate
A       2014-12-31           NaN
        2015-12-31     -0.421573
AA      2014-12-31           NaN
        2015-12-31     -0.057391
AAL     2015-12-31           NaN
AAP     2015-12-31           NaN
AAPL    2014-12-31           NaN
        2015-12-31      0.176116
ABBV    2014-12-31           NaN
        2015-12-31      0.145240
Name: revenueusd, dtype: float64

One more thing, your dates are nicely formatted so you can easily convert the column to type datetime like this:

data["calendardate"] = pd.to_datetime(data["calendardate"], infer_datetime_format=True)

Iterating through DataFrame rows to create new column while referencing other rows

Answers (2)

Related Questions