Reputation: 25
I´m trying to learn how to use function
and apply
instead of for-loops, as it supposedly takes less time. Can anyone give me advise on how to change the following code in order to reduce time spent?
The goal is to make RF
have the same properties at Dates
but with the corresponding Euribor
returns instead of the "dates" in Dates
. The dates are "serialnumber-dates" in both Euribor
and Dates
(class: numeric).
Example Data(output of this code is my similar to my input):
Dates=matrix(NA,4,10)
Dates[1,1:8]=seq(3610,3617,1)
Dates[2,1:10]=seq(3613,3622,1)
Dates[3,1:5]=seq(3615,3619,1)
Dates[4,1:7]=seq(3616,3622,1)
Euribor=matrix(0,2,51)
Euribor[1,]=seq(3600,3650,1)
Euribor[2,]=rnorm(51)
This solution returns the correct output, but takes very long time with a 4500x4700 matrix.
RF = matrix(0,nrow(Dates),ncol(Dates))
for (i in 1:nrow(Dates)){
In=grep(Dates[i,1],Euribor[1,])
end=sum(!is.na(Dates[i,]))
RF[i,1:end]=as.matrix(Euribor[2,In:(In+end-1)])
}
Thank you in advance for any help.
Upvotes: 1
Views: 52
Reputation: 2806
Dates=matrix(NA,4,10)
Dates[1,1:8]=seq(3610,3617,1)
Dates[2,1:10]=seq(3613,3622,1)
Dates[3,1:5]=seq(3615,3619,1)
Dates[4,1:7]=seq(3616,3622,1)
Euribor=matrix(0,2,51)
Euribor[1,]=seq(3600,3650,1)
Euribor[2,]=rnorm(51)
RF = matrix(0,nrow(Dates),ncol(Dates))
for (i in 1:nrow(Dates)){
In=grep(Dates[i,1],Euribor[1,])
end=sum(!is.na(Dates[i,]))
RF[i,1:end]=as.matrix(Euribor[2,In:(In+end-1)])
}
RF2 = matrix(Euribor[2,match(c(Dates), Euribor[1,])], nrow = nrow(Dates), ncol = ncol(Dates))
So, RF2 is the fast way to do this and should be the same as RF.
> RF
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] -0.0819133 -0.08336513 0.6926775 1.0500598 -0.5244457 1.1804117 1.7349849 1.3002456 0.0000000 0.0000000
[2,] 1.0500598 -0.52444574 1.1804117 1.7349849 1.3002456 -0.7438148 -1.2804350 0.9480801 -0.7692101 0.3189216
[3,] 1.1804117 1.73498487 1.3002456 -0.7438148 -1.2804350 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[4,] 1.7349849 1.30024557 -0.7438148 -1.2804350 0.9480801 -0.7692101 0.3189216 0.0000000 0.0000000 0.0000000
> RF2
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] -0.0819133 -0.08336513 0.6926775 1.0500598 -0.5244457 1.1804117 1.7349849 1.3002456 NA NA
[2,] 1.0500598 -0.52444574 1.1804117 1.7349849 1.3002456 -0.7438148 -1.2804350 0.9480801 -0.7692101 0.3189216
[3,] 1.1804117 1.73498487 1.3002456 -0.7438148 -1.2804350 NA NA NA NA NA
[4,] 1.7349849 1.30024557 -0.7438148 -1.2804350 0.9480801 -0.7692101 0.3189216 NA NA NA
We can replace the NAs with 0s like this
RF2[is.na(RF2)] = 0
> RF2
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] -0.0819133 -0.08336513 0.6926775 1.0500598 -0.5244457 1.1804117 1.7349849 1.3002456 0.0000000 0.0000000
[2,] 1.0500598 -0.52444574 1.1804117 1.7349849 1.3002456 -0.7438148 -1.2804350 0.9480801 -0.7692101 0.3189216
[3,] 1.1804117 1.73498487 1.3002456 -0.7438148 -1.2804350 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
[4,] 1.7349849 1.30024557 -0.7438148 -1.2804350 0.9480801 -0.7692101 0.3189216 0.0000000 0.0000000 0.0000000
Edit: I figured I should probably explain how this works. Essentially all we need is the index in Euribor where the Date values are. I figured the easiest way to do this was to collapse Date into a vector and then match the locations of the date values back into Euribor and take the values in col 2 on the matches.
Collapsing Date into a vector goes by column and so does matrix by default so it constructs it back into the form we're looking for.
Finally, we can just swap out all the NAs at the end, and that part is pretty easy.
Since we've removed the need for the for loop this will be much faster. I'm not sure of how we could use an apply function here. There probably is a way but it's not needed to speed it up.
Upvotes: 1