Rilcon42
Rilcon42

Reputation: 9763

returning list of lists as dataframe

What is the correct way to record the results of the inner lapply call here? My end goal is a dataframe with percentage_accuracy, statparam, and cutoff for each value tested. Is there a more "R" way to do this?

best<-lapply(1:100,function(i){
  statval<-sample.int(c(1,0),100,replace=T)
  lapply(1:100,function(j){
    aaa<-statval+j*27 
    list(percentage_accuracy=aaa,statparam=i,cutoff=j)
  })
})

Upvotes: 2

Views: 55

Answers (2)

bgoldst
bgoldst

Reputation: 35314

Firstly, you're not using sample.int() correctly. The first argument is treated as a scalar, representing the number of items to sample from. This means your call is always sampling from one item, namely 1, and there will be no randomness. This differs from the behavior of sample(). Example:

sample.int(c(1,0),10L,T);
## [1] 1 1 1 1 1 1 1 1 1 1
sample(c(1,0),10L,T);
## [1] 1 0 1 0 0 0 0 0 1 1

Given that you need to sample from 0:1, you should be calling sample().


From your code, it looks like we can precompute the statparam and cutoff columns in one shot without running any loops (hidden or otherwise). We can also precompute a statval vector in one shot, after which the only remaining task will be to perform the multiplication and addition to complete the percentage_accuracy column. The tricky bit is getting the replications correct, since we need the columns to line up in a particular way, and we need to repeat each 100-element piece of the statval vector the correct number of times, since your code reuses it during the inner loop.

Here's how I would do this:

set.seed(1L);
NI <- 100L;
NS <- 100L;
NJ <- 100L;
res <- data.frame(
    percentage_accuracy=c(replicate(NI,rep(sample(0:1,NS,T),NJ))),
    statparam=rep(seq_len(NI),each=NS*NJ),
    cutoff=rep(seq_len(NJ),NI,each=NS)
);
res$percentage_accuracy <- res$percentage_accuracy+res$cutoff*27L;
str(res);
## 'data.frame': 1000000 obs. of  3 variables:
##  $ percentage_accuracy: int  27 27 28 28 27 28 28 28 28 27 ...
##  $ statparam          : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ cutoff             : int  1 1 1 1 1 1 1 1 1 1 ...

Upvotes: 3

akrun
akrun

Reputation: 887118

We can convert the inner most to data.frame, rbind it and then do the rbind on the outer most loop.

 d1 <- do.call(rbind, lapply(best, function(x) do.call(rbind, lapply(x, data.frame) )))
 str(d1)
 #'data.frame':   1000000 obs. of  3 variables:
 #$ percentage_accuracy: num  28 28 28 28 28 28 28 28 28 28 ...
 #$ statparam          : int  1 1 1 1 1 1 1 1 1 1 ...
 #$ cutoff             : int  1 1 1 1 1 1 1 1 1 1 ...

If this needs to be faster, use rbindlist

library(data.table)
d2 <- rbindlist(lapply(best, function(x) rbindlist(lapply(x, data.frame))))

Upvotes: 3

Related Questions