Leosar
Leosar

Reputation: 2072

Creating a dataframe using the values in another dataframe R

I would like to generate a synthetic data set using the values stored in a data frame. In the new data frame I need n rows from a lognormal random distribution with a specified mean, so I tried this:

sp = '
species   CE_mean  Ph_mean     n
Apocal 0.6398000 6.233600   200
Aporos 0.6334615 6.518269   156
Apotra 0.8448980 6.561224    49
'
msp <- read.table(text=sp,header = TRUE)

spdf <- data.frame()

for( i in 1:nrow(msp))
{
  spm1 <- data.frame()
  spm1$CE <-rlnorm(n=msp$n[i],meanlog=msp$CE_mean[i],sdlog=0.1)
  spm1$Ph <-rlnorm(n=msp$n[i],meanlog=msp$Ph_mean[i],sdlog=0.1)
  spm1$species <- msp$species[i]
  spdf<-rbind(spdf,spm1)
}

But it doesn't work, I wonder how could I make this using dplyr.

Upvotes: 1

Views: 185

Answers (2)

Abdou
Abdou

Reputation: 13274

This is a dplyr solution:

spdf <- msp %>% rowwise() %>%
    do(data.frame(species = .$species, 
       CE=rlnorm(n=.$n,meanlog=.$CE_mean,sdlog=0.1),
       Ph=rlnorm(n=.$n,meanlog=.$Ph_mean,sdlog=0.1),
       stringsAsFactors=FALSE)) %>%
    ungroup()

Should yield:

   species       CE       Ph
*   <fctr>    <dbl>    <dbl>
1   Apocal 2.168593 538.4061
2   Apocal 1.868780 535.1687
3   Apocal 1.993015 503.7631
4   Apocal 1.764942 495.0502
5   Apocal 1.671921 503.3961
6   Apocal 2.013073 464.7946
7   Apocal 2.190407 538.6861
8   Apocal 1.668348 479.1846
9   Apocal 2.018912 443.7977
10  Apocal 1.802224 635.2461
# ... with 395 more rows

I hope this helps.

Upvotes: 1

aichao
aichao

Reputation: 7435

I'm not sure dplyr is the best approach here. You can fix your code by:

spdf <- data.frame()
for( i in 1:nrow(msp)) {
  CE <-rlnorm(n=msp$n[i],meanlog=msp$CE_mean[i],sdlog=0.1)
  Ph <-rlnorm(n=msp$n[i],meanlog=msp$Ph_mean[i],sdlog=0.1)
  species <- msp$species[i]
  spdf<-rbind(spdf,data.frame(CE=CE,Ph=Ph,species=species))
}

or:

spdf <- do.call(rbind,lapply(1:nrow(msp),function(i) data.frame(CE=rlnorm(n=msp$n[i],meanlog=msp$CE_mean[i],sdlog=0.1),
                                                                Ph=rlnorm(n=msp$n[i],meanlog=msp$Ph_mean[i],sdlog=0.1),
                                                                species=msp$species[i])))

With set.seed(123), I get:

set.seed(123)
spdf
##          CE       Ph species
##1   1.792753 634.9086  Apocal
##2   1.852956 581.0526  Apocal
##3   2.215927 496.2528  Apocal
##4   1.909518 538.0327  Apocal
##5   1.920775 488.9039  Apocal
## ...
##195 1.663161 481.1812  Apocal
##196 2.315258 592.2863  Apocal
##197 2.013493 471.6256  Apocal
##198 1.673091 554.5590  Apocal
##199 1.783688 449.2285  Apocal
##200 1.684135 491.8362  Apocal
##201 1.870313 673.9387  Aporos
##202 1.676312 642.6347  Aporos
##203 1.768243 664.1729  Aporos
##204 1.878695 636.0716  Aporos
##205 2.014822 623.2107  Aporos
## ...
##352 1.742361 618.8405  Aporos
##353 2.105457 692.9110  Aporos
##354 1.931784 730.0238  Aporos
##355 2.222545 753.2359  Aporos
##356 1.628345 663.1387  Aporos
##357 2.306046 752.1002  Apotra
##358 2.307643 752.1086  Apotra
##359 2.688663 597.0578  Apotra
##360 2.604928 733.6985  Apotra
##361 2.530301 778.9991  Apotra
## ...
##401 2.575855 717.4006  Apotra
##402 2.281315 701.8091  Apotra
##403 1.898625 877.7533  Apotra
##404 2.282586 726.9484  Apotra
##405 2.456843 696.0313  Apotra

Upvotes: 2

Related Questions