Reputation: 282
I have the following data:
Company Year Variables Data
ABC 2000 Revenue 10
ABC 2001 Revenue 15
ABC 2002 Revenue 12
ABC 2003 Revenue 25
ABC 2004 Revenue 30
CDE 2000 Revenue 5
CDE 2001 Revenue 8
CDE 2002 Revenue 17
CDE 2003 Revenue 9
CDE 2004 Revenue 34
#etc
I want to calculate Compound Annual Growth Rate (CAGR) for past 3 years.
For example, 3 year CAGR results for each company will be:
Company Year Variables Data CAGR
ABC 2000 Revenue 10 NA
ABC 2001 Revenue 15 NA
ABC 2002 Revenue 12 6.27%
ABC 2003 Revenue 25 18.56%
ABC 2004 Revenue 30 35.72%
CDE 2000 Revenue 5 NA
CDE 2001 Revenue 8 NA
CDE 2002 Revenue 17 50.37%
CDE 2003 Revenue 9 4.00%
CDE 2004 Revenue 34 25.99%
I am using following formula across data by year:
CAGR for 2004=((LastYear/PreviousYear)^(1/n))-1
For example for n = 2
LastYear =2004
PreviousYear =2004-2 = 2002
Attempted R code for calculating CAGR for 2004 vs 2002:
library(tibble)
library(dplyr)
library(lubridate)
year<-c(rep(2000:2004,2))
company<-rep(c("ABC","CDE"),5)
variable<-rep("revenue",10)
data<-c(10,15,12,25,30,5,8,17,9,34)
tibdf<-tibble(company,year,variable,data)
View(tibdf)
#revenue2004<-tibdf%>%filter(year==2004)%>%select(company,data)
#revenue2002<-tibdf%>%filter(year==2001)%>%select(company,data)
Calculating CAGR (from Plot Compound Annual Growth Rate (3 independent variables) in R)
annual.growth.rate <- function(a){
T1 <- max(a$year) - min(a$year)+1
FV <- a[which(a$year == max(a$year)),"data"]
SV <- a[which(a$year == min(a$year)),"data"]
cagr <- ((FV/SV)^(1/T1)) -1
}
Use tibdf for a in function. Unfortunately, I am not able to apply function to my data.
Appreciate your help.
Upvotes: 6
Views: 5394
Reputation: 7724
This function calculates the CAGR for different values of of n
:
calc_cagr <- function(df, n) {
df <- df %>%
arrange(company, year) %>%
group_by(company) %>%
mutate(cagr = ((data / lag(data, n)) ^ (1 / n)) - 1)
return(df)
}
calc_cagr(tibdf, 2)
# A tibble: 10 x 5
# Groups: company [2]
# company year variable data cagr
# <chr> <int> <chr> <dbl> <dbl>
# 1 ABC 2000 revenue 10.0 NA
# 2 ABC 2001 revenue 15.0 NA
# 3 ABC 2002 revenue 12.0 0.0954
# 4 ABC 2003 revenue 25.0 0.291
# 5 ABC 2004 revenue 30.0 0.581
# 6 CDE 2000 revenue 5.00 NA
# 7 CDE 2001 revenue 8.00 NA
# 8 CDE 2002 revenue 17.0 0.844
# 9 CDE 2003 revenue 9.00 0.0607
# 10 CDE 2004 revenue 34.0 0.414
I do however get different results than you, but your questions is a little ambiguous about whether to divide by n
or n+1
.
Data
tibdf <- tibble(company = rep(c("ABC", "CDE"), each = 5),
year = rep(2000:2004, 2),
variable = rep("revenue", 10),
data = c(10, 15, 12, 25, 30, 5, 8, 17, 9, 34))
Upvotes: 3
Reputation: 19746
Here is one approach:
library(tidyverse)
df %>%
arrange(Company, Year) %>% #in case the years are not in order (here they are)
group_by(Company) %>%
mutate(lagY = lag(Year), #get the lag year
lagD = lag(Data), #get lad Data
t = Year - lagY, #calculate time
CAGR = (Data / lagD)^(1/t) - 1) %>% #calculate CAGR
select(-lagY, -lagD, -t) #remove unwanted variables
#output:
Company Year Variables Data CAGR
<fct> <int> <fct> <int> <dbl>
1 ABC 2000 Revenue 10 NA
2 ABC 2001 Revenue 15 0.500
3 ABC 2002 Revenue 12 - 0.200
4 ABC 2003 Revenue 25 1.08
5 ABC 2004 Revenue 30 0.200
6 CDE 2000 Revenue 5 NA
7 CDE 2001 Revenue 8 0.600
8 CDE 2002 Revenue 17 1.12
9 CDE 2003 Revenue 9 - 0.471
10 CDE 2004 Revenue 34 2.78
Or a bit denser without making the intermediate variables:
df %>%
arrange(Company, Year) %>%
group_by(Company) %>%
mutate(CAGR = (Data/lag(Data))^(1/(Year-lag(Year))) - 1)
data:
df <- read.table(text ="Company Year Variables Data
ABC 2000 Revenue 10
ABC 2001 Revenue 15
ABC 2002 Revenue 12
ABC 2003 Revenue 25
ABC 2004 Revenue 30
CDE 2000 Revenue 5
CDE 2001 Revenue 8
CDE 2002 Revenue 17
CDE 2003 Revenue 9
CDE 2004 Revenue 34", header = T)
Upvotes: 3