Reputation: 43
I'm using R to investigate how the return affects a person's pension account. In order to do this I'm calculating the development of the pension account from age 25 until retirement at age 70 for 1000 different return scenarios. I'm using the variables expenses(e), monthly deposit(m), return in percent(r), account balance (y) and return in euros (x). They are all stored in data frames with the dimensions 46x1000.
I have succesfully managed to calculate it using a for loop. However this is very slow, and since i'm doing a lot of these i am wondering if someone have an idea to speed up the code. I have tried with apply functions and with vectorization but cannot get it to work. My problem is that i have to calculate the numbers for year i before calculating the numbers for year i+1. I have searched the internet for a solution, but have a hard time finding answers which apply for my specific problem. I should note that I'm still pretty new to R.
A have written a simplified version of the code im using:
for (i in 3:46) {
x[i-1,]<-(y[i-1,]+m[i-1,]*6-0.5*e[i-1,])*r[i-1,]
y[i,]<-y[i-1,]+x[i-1,]-e[i-1,]+m[i-1,]*12
}
I hope someone is able to help, and thanks in advance.
Best regards Rasmus
Upvotes: 1
Views: 123
Reputation: 5590
Your process looks to me like it needs the loop, since each iteration depends on the one before it. As @Gregor de Cillia mentions in the comments, you could do this in C++ for a speed improvement.
First, set up some data.
set.seed(1)
e <- matrix( data = rnorm( n = 46000, mean = 1000, sd = 200 ),
nrow = 46,
ncol = 1000 )
m <- matrix( data = rnorm( n = 46000, mean = 2000, sd = 200 ),
nrow = 46,
ncol = 1000 )
r <- matrix( data = rnorm( n = 46000, mean = 4, sd = 0.5 ),
nrow = 46,
ncol = 1000 )
x <- matrix( data = NA_real_, nrow = 45, ncol = 1000 )
y <- matrix( data = NA_real_, nrow = 46, ncol = 1000 )
y[1,] <- rnorm( n = 1000, 10000, 1000 )
Then define a C++ function in an Rcpp
file. This returns a list with your two matrices x
and y
as list items:
List pension( NumericMatrix e,
NumericMatrix m,
NumericMatrix r,
NumericVector yfirstrow ) {
int ncols = e.cols();
int nrows = e.rows();
NumericMatrix x( nrows - 1, ncols );
NumericMatrix y( nrows, ncols );
y( 0, _ ) = yfirstrow;
for( int i = 1; i < nrows; i++ ) {
x( i-1, _ ) = ( y( i-1, _ ) + m( i-1, _ ) * 6 - 0.5 * e( i-1, _ ) ) * r( i-1, _ );
y( i, _ ) = y( i-1, _ ) + x( i-1, _ ) - e( i-1, _ ) + m( i-1, _ )* 12;
};
List ret;
ret["x"] = x;
ret["y"] = y;
return ret;
}
Compare the two methods for speed.
microbenchmark::microbenchmark(
R = {
for (i in 2:46) {
x[i-1,] <- unlist( (y[i-1,] + m[i-1,]*6 - 0.5*e[i-1,] ) * r[i-1,] )
y[i,]<- unlist( y[i-1,]+x[i-1,]-e[i-1,]+m[i-1,]*12 )
}
},
cpp = {
cppList <- pension( e, m, r, y[1,] )
},
times = 100
)
Make sure the outputs match:
> identical( x, cppList$x )
[1] TRUE
> identical( y, cppList$y )
[1] TRUE
The speed test results:
Unit: microseconds
expr min lq mean median uq max neval
R 3309.962 3986.569 6961.838 5244.479 6219.215 96576.592 100
cpp 879.713 992.229 1266.014 1124.345 1273.691 3041.966 100
So the Rcpp
solution is around 5x faster here, but to be honest, the R
loop you've made isn't too shabby for the dataset you're working with (with only 45 iterations, the overhead of the R loop isn't too much of a hindrance). If you really need the speed, c++ can help.
Upvotes: 4