Reputation: 106
I am struggeling to calculate the percent difference between the annual net sales for a company, with taken into account NA's.
Here's an sample of the data:
dt <- data.table(lpermno = c(10065, 10065, 10065, 10065, 59328, 61241, 59328, 61241, 59328, 61241, 59328, 61241), fyear = c(2001, 2002, 2003, 2004, 2001, 2001, 2002, 2002, 2003, 2003, 2004, 2004), sale = c(NA, NA, NA, NA, 26539, 3891.754, 26764, 2697.029, 30141, 3519.168, 34209, 5001.435))
lpermno fyear sale
10065 2001 NA
10065 2002 NA
10065 2003 NA
10065 2004 NA
59328 2001 26539.000
59328 2002 26764.000
59328 2003 30141.000
59328 2004 34209.000
61241 2001 3891.754
61241 2002 2697.029
61241 2003 3519.168
61241 2004 5001.435
I'd like to calculate a new variable, called sales_change. This variable should be the percentage change for sale.
[sale_n]/[sale_n-1] for each company.
For the first observation of a company, the sales_change needs to be just 1.
I've read the following posts for guidance, but it didn't work out.
For the example data i gave above, the desired output would be:
output <- data.table(lpermno = c(10065, 10065, 10065, 10065, 59328, 59328, 59328, 59328, 61241, 61241, 61241, 61241), fyear = c(2001, 2002, 2003, 2004, 2001, 2002, 2003, 2004, 2001, 2002, 2003, 2004), sale = c(NA, NA, NA, NA, 3891.754, 2697.029, 3519.168, 5001.435, 26539, 26764, 30141, 34209), output = c(NA, NA, NA, NA, 1, 0.693011, 1.304831, 1.421198, 1, 1.008478, 1.126177, 1.134966))
lpermno fyear sale output
10065 2001 NA NA
10065 2002 NA NA
10065 2003 NA NA
10065 2004 NA NA
59328 2001 3891.754 1.000000
59328 2002 2697.029 0.693011
59328 2003 3519.168 1.304831
59328 2004 5001.435 1.421198
61241 2001 26539.000 1.000000
61241 2002 26764.000 1.008478
61241 2003 30141.000 1.126177
61241 2004 34209.000 1.134966
I'd appreciate some assistance. Thanks in advance.
Upvotes: 5
Views: 2137
Reputation: 784
relchange <- function(x) {
x[!is.na(x)] <- c(1, exp(diff(log(x[!is.na(x)]))))
return(x)
}
relchange(dt$sale)
[1] NA NA NA NA 1.0000000 0.6930112 1.3048314
[8] 1.4211981 5.3062771 1.0084781 1.1261770 1.1349657
Upvotes: 0
Reputation: 38510
Using data.table
, you could do the following:
dt[, pctchnge := sale / c(sale[1], head(sale, -1)), by="lpermno"][order(lpermno)]
Here, create a new variable with :=
, repeat the first month of sales, and drop the final month with tail for the denominator. perform calculations by lpermno. Then sort by lpermno.
this returns
lpermno fyear sale pctchnge
1: 10065 2001 NA NA
2: 10065 2002 NA NA
3: 10065 2003 NA NA
4: 10065 2004 NA NA
5: 59328 2001 26539.000 1.0000000
6: 59328 2002 26764.000 1.0084781
7: 59328 2003 30141.000 1.1261770
8: 59328 2004 34209.000 1.1349657
9: 61241 2001 3891.754 1.0000000
10: 61241 2002 2697.029 0.6930112
11: 61241 2003 3519.168 1.3048314
12: 61241 2004 5001.435 1.4211981
Upvotes: 6
Reputation: 72984
You could use by
which applies a function to your data frame split by lpermno
.
While splitted, relative change formula is applied onto sales
column of every sub data frame which are combined by cbind
with the new output
column. With if
we control for anyNA
cases and throw NA
since a meaningful mean cannot be calculated. With do.call(rbind)
we put the data frame back together.
do.call(rbind, by(dt, dt$lpermno, function(x)
if (anyNA(x$sale)) return(cbind(x, output=NA))
else return (cbind(x, output=c(1, 1/x$sale[-length(x$sale)]*c(x$sale[-1]))))))
# lpermno fyear sale output
# 1: 10065 2001 NA NA
# 2: 10065 2002 NA NA
# 3: 10065 2003 NA NA
# 4: 10065 2004 NA NA
# 5: 59328 2001 26539.000 1.0000000
# 6: 59328 2002 26764.000 1.0084781
# 7: 59328 2003 30141.000 1.1261770
# 8: 59328 2004 34209.000 1.1349657
# 9: 61241 2001 3891.754 1.0000000
# 10: 61241 2002 2697.029 0.6930112
# 11: 61241 2003 3519.168 1.3048314
# 12: 61241 2004 5001.435 1.4211981
Data
dt <- structure(list(lpermno = c(10065, 10065, 10065, 10065, 59328,
61241, 59328, 61241, 59328, 61241, 59328, 61241), fyear = c(2001,
2002, 2003, 2004, 2001, 2001, 2002, 2002, 2003, 2003, 2004, 2004
), sale = c(NA, NA, NA, NA, 26539, 3891.754, 26764, 2697.029,
30141, 3519.168, 34209, 5001.435)), row.names = c(NA, -12L), class = c("data.table",
"data.frame"))
Upvotes: 3