Gabor Csardi
Gabor Csardi

Reputation: 10825

R: perform parameter sweep and collect results in long data frame

I am looking the right R idiom to run a function over a set of parameters and create a long data frame from the results. Imagine that you have the following toy function:

fun <- function(sd, mean, foobar = "foobar") {
  list(random = rnorm(10) * sd + mean + 1:10, foobar = foobar)
}

Now you want to run fun over different values of sd and mean:

par_sd <- rep(1:5, 3)
par_mean <- rep(0:2, each = 5)
pars <- data.frame(sd = par_sd, mean = par_mean)

I want to run fun for the parameters in each row of pars, and collect the results in a data frame with columns sd, mean, pos, value. Here is a rather clumsy solution:

set.seed(42)

## Run fun
res <- lapply(seq_len(nrow(pars)), function(x) {
  do.call(fun, as.list(pars[x, ]))
})

## Select the result we need
res <- lapply(res, "[[", "random")

## Make it a single data frame
res <- do.call(rbind, res)

## Together with the parameters
res <- as.data.frame(cbind(sd = par_sd, mean = par_mean, res))
colnames(res) <- c("sd", "mean", 1:10)

## Make it a long data frame
res <- reshape2::melt(res, id.vars=c("sd", "mean"), 
         variable.name = "pos", value.name="value")

## Done
res[1:5,]
#>   sd mean pos      value
#> 1  1    0   1 2.37095845
#> 2  2    0   1 3.60973931
#> 3  3    0   1 0.08008422
#> 4  4    0   1 2.82180049
#> 5  5    0   1 2.02999300

Is there a simpler way to do this? Anyone knows a package that does things like this? My quick search did not give any good results...

Upvotes: 1

Views: 612

Answers (2)

baptiste
baptiste

Reputation: 77106

If you're willing to amend fun() to return a data.frame, I find the most elegant solution is plyr's mdply.

fun <- function(sd, mean, foobar = "foobar") {
  data.frame(random = rnorm(10) * sd + mean + 1:10, foobar = foobar)
}

par_sd <- rep(1:5, 3)
par_mean <- rep(0:2, each = 5)
pars <- data.frame(sd = par_sd, mean = par_mean)

results = mdply(pars, fun, foobar = "stuff")
str(results)

Upvotes: 1

IRTFM
IRTFM

Reputation: 263431

mapply would seem a good fit:

> str(with(pars, mapply(fun, sd=sd, mean=mean) ) )
List of 30
 $ : num [1:10] 3.16 2.28 2.84 1.49 3.43 ...
 $ : chr "foobar"
 $ : num [1:10] 3.429 0.157 0.583 1.542 6.485 ...
 $ : chr "foobar"
 $ : num [1:10] -4.56 -1.51 -1.33 7.16 3.21 ...
 $ : chr "foobar"
 $ : num [1:10] -2.275 2.225 4.196 0.962 15.739 ...
 $ : chr "foobar"
 $ : num [1:10] 6.23 10.08 2.85 6.81 4.51 ...
 $ : chr "foobar"
 $ : num [1:10] 1.65 3.15 5.62 5.91 6.14 ...
 $ : chr "foobar"
 $ : num [1:10] 4.26 1.95 7.33 2.72 6.29 ...
 $ : chr "foobar"
 $ : num [1:10] 7.53 6.74 3.6 6.43 3.08 ...
 $ : chr "foobar"
 $ : num [1:10] -0.4181 -0.0584 5.5812 1.038 8.2482 ...
 $ : chr "foobar"
 $ : num [1:10] 0.2377 4.8557 5.2177 -0.0706 2.0434 ...
 $ : chr "foobar"
 $ : num [1:10] 2.95 4.3 5.26 8.58 5.81 ...
 $ : chr "foobar"
 $ : num [1:10] -0.85 4.83 8.19 5.17 6.58 ...
 $ : chr "foobar"
 $ : num [1:10] 3.59 11.46 6.29 6.57 2.97 ...
 $ : chr "foobar"
 $ : num [1:10] 0.117 3.142 10.473 10.196 5.56 ...
 $ : chr "foobar"
 $ : num [1:10] 13.03 2.64 -1.07 5.29 1.97 ...
 $ : chr "foobar"
 - attr(*, "dim")= int [1:2] 2 15
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:2] "random" "foobar"
  ..$ : NULL

By default mapply will attempt to simplify and if you wanted to keep them as separate objects you could negate that default:

> str(with(pars, mapply(fun, sd=sd, mean=mean, SIMPLIFY=FALSE) ) )
List of 15
 $ :'data.frame':   10 obs. of  2 variables:
  ..$ random: num [1:10] 1.08 0.68 3.16 3.38 5.96 ...
  ..$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1
 $ :'data.frame':   10 obs. of  2 variables:
  ..$ random: num [1:10] 0.0927 5.1506 -1.0109 2.7136 2.1263 ...
  ..$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1
 $ :'data.frame':   10 obs. of  2 variables:
  ..$ random: num [1:10] -0.331 2.9 -1.705 5.471 4.712 ...
  ..$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1
snipped

And if you need them in one stacked dataframe, it's just:

> str(do.call( rbind, with(pars, mapply(fun, sd=sd, mean=mean, SIMPLIFY=FALSE) ) ))
'data.frame':   150 obs. of  2 variables:
 $ random: num  1 3.34 2.5 4.72 4.25 ...
 $ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1 ...

If you want these "labeled" with the sd and mean values, just this modification of the constructor function:

 fun <- function(sd, mean, foobar = "foobar") {
         data.frame(random = rnorm(10) * sd + mean + 1:10, 
                    sd=sd, mean=mean, foobar = foobar)
        }
 str(do.call( rbind, with(pars, 
                     mapply(fun, sd=sd, mean=mean, SIMPLIFY=FALSE) ) ))
 #---------------
'data.frame':   150 obs. of  4 variables:
 $ random: num  1.42 1.13 3.73 4.5 5.63 ...
 $ sd    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ mean  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1 ...

Upvotes: 0

Related Questions