Reputation: 35
How can I achieve the same as below without using a for-loop?
df1 = data.frame( val = c("a", "c", "c", "b", "e") )
m1 = matrix(0, nrow=nrow(df1), ncol=length( c("a", "b", "c", "d", "e") ) )
colnames(m1) = c("a", "b", "c", "d", "e")
for(i in 1:nrow(df1)){
m1[i, df1[i, 1] ] = 1 #For each entry in dataframe, mark the respective column as 1
}
Upvotes: 1
Views: 118
Reputation: 3058
You have several strange things in your code. First df1 is not needed at all, because data.frame is not supposed to store one dimensional vector. val = c("a", "c", "c", "b", "e")
is enough. Also, as others suggested, there are more compact (and some more efficient) ways to achieve the same thing. However, if in your actual problem you work with much greater amount of data and you find it easier to use for loops, then you should consider using C++ code (and its for which is much faster).
Here is a benchmarking I did to compare the R and C++ fors, by creating a function which will add first n numbers (I did the test for n = 100K).
Here is the code:
library(Rcpp)
library(rbenchmark)
cppFunction(
'int cppSum(int n) {
int s = 0;
for(int i = 0; i <= n; i++) {
s += i;
}
return s;
}'
)
rSum <- function(n) {
s = 0
for (i in c(1:n)) {
s = s + i
}
return(s)
}
n = 100000
benchmark(rSum(n), cppSum(n))
And here is the result:
test replications elapsed relative user.self sys.self user.child sys.child
2 cppSum(n) 100 0.008 1.00 0.00 0 0 0
1 rSum(n) 100 2.790 348.75 2.79 0 0 0
You can notice in the relative
column that R function is 348.75 times slower than the C++ function. In a computationally intensive processes it is a great optimization to use C++ for looping. Once, I have been running a for inside some other loop. It would take forever to finish. When I changed the R for with C++ for it finished in couple of minutes.
[Edit]
This example does not solve your actual problem. The original question looked for alternative to the slow R for loop, so I suggested you alternative faster for loop, that being the C++
for loop. The working example is not using your data, because it is too small for any benchmarking. Instead, I use loop with 100K
iterations, so there could be visible the differences between the 2 different loops.
Upvotes: 0
Reputation: 26456
This
f<-function(m1,df) {
for(i in 1:nrow(df1))
m1[i, df1[i, 1] ] = 1
return(m1)
}
is equivalent to
g<-function(m1,df) {
m1[cbind(seq_len(nrow(df)),df1[,1])]<-1
return(m1)
}
The latter is faster for this particular example
> microbenchmark(f(m1,df1),g(m1,df1))
Unit: microseconds
expr min lq mean median uq max neval cld
f(m1, df1) 167.085 174.885 194.58999 185.969 200.132 342.379 100 b
g(m1, df1) 20.116 22.990 27.12403 24.222 27.300 158.053 100 a
Note, however,
Upvotes: 4