Reputation: 9763
What is the correct way to record the results of the inner lapply
call here? My end goal is a dataframe with percentage_accuracy
, statparam
, and cutoff
for each value tested. Is there a more "R" way to do this?
best<-lapply(1:100,function(i){
statval<-sample.int(c(1,0),100,replace=T)
lapply(1:100,function(j){
aaa<-statval+j*27
list(percentage_accuracy=aaa,statparam=i,cutoff=j)
})
})
Upvotes: 2
Views: 55
Reputation: 35314
Firstly, you're not using sample.int()
correctly. The first argument is treated as a scalar, representing the number of items to sample from. This means your call is always sampling from one item, namely 1, and there will be no randomness. This differs from the behavior of sample()
. Example:
sample.int(c(1,0),10L,T);
## [1] 1 1 1 1 1 1 1 1 1 1
sample(c(1,0),10L,T);
## [1] 1 0 1 0 0 0 0 0 1 1
Given that you need to sample from 0:1
, you should be calling sample()
.
From your code, it looks like we can precompute the statparam
and cutoff
columns in one shot without running any loops (hidden or otherwise). We can also precompute a statval
vector in one shot, after which the only remaining task will be to perform the multiplication and addition to complete the percentage_accuracy
column. The tricky bit is getting the replications correct, since we need the columns to line up in a particular way, and we need to repeat each 100-element piece of the statval
vector the correct number of times, since your code reuses it during the inner loop.
Here's how I would do this:
set.seed(1L);
NI <- 100L;
NS <- 100L;
NJ <- 100L;
res <- data.frame(
percentage_accuracy=c(replicate(NI,rep(sample(0:1,NS,T),NJ))),
statparam=rep(seq_len(NI),each=NS*NJ),
cutoff=rep(seq_len(NJ),NI,each=NS)
);
res$percentage_accuracy <- res$percentage_accuracy+res$cutoff*27L;
str(res);
## 'data.frame': 1000000 obs. of 3 variables:
## $ percentage_accuracy: int 27 27 28 28 27 28 28 28 28 27 ...
## $ statparam : int 1 1 1 1 1 1 1 1 1 1 ...
## $ cutoff : int 1 1 1 1 1 1 1 1 1 1 ...
Upvotes: 3
Reputation: 887118
We can convert the inner most to data.frame
, rbind
it and then do the rbind
on the outer most loop.
d1 <- do.call(rbind, lapply(best, function(x) do.call(rbind, lapply(x, data.frame) )))
str(d1)
#'data.frame': 1000000 obs. of 3 variables:
#$ percentage_accuracy: num 28 28 28 28 28 28 28 28 28 28 ...
#$ statparam : int 1 1 1 1 1 1 1 1 1 1 ...
#$ cutoff : int 1 1 1 1 1 1 1 1 1 1 ...
If this needs to be faster, use rbindlist
library(data.table)
d2 <- rbindlist(lapply(best, function(x) rbindlist(lapply(x, data.frame))))
Upvotes: 3