user2547708
user2547708

Reputation: 3

How do I write a loop to random sample from multiple subsets of data?

I have a (probably simple) question that I can't figure out.

I'd like to write a loop (or use mapply or ddply?) to randomly sample three values from each of multiple subsets of data, and find the mean value for that random sample and put it in a dataframe.

For example, here is a small portion of the data:

    BayStation DIN Year
1        60069 0.0090 1998     
2        60069 0.0060 1998     
3        60069 0.0100 1998     
4        60069 0.0020 1998     
5        60069 0.0140 1998     
6        60069 0.0110 1998     
7        60081 0.0140 1998     
8        60081 0.0140 1998     
9        60081 0.0060 1998     
10       60081 0.0020 1998     
11       60081 0.0250 1998     
12       60081 0.0140 1998     
13       60081 0.0110 1998     

I want to subset by BayStation, randomly sample three DIN values for each BayStation, and calculate the mean. I know how to do this for one bay station:

test<-mean(sample(DIN1998$DIN[DIN1998$BayStation=="60081"], 
                  3, replace = FALSE, prob = NULL))

But I'd like to know how I could do this for an entire dataframe, with hundreds of stations. Can anyone tell me how to do this? Or give a big hint? Safe to say, my R skills are very basic- thanks in advance!

Upvotes: 0

Views: 561

Answers (3)

Jilber Urbina
Jilber Urbina

Reputation: 61154

Here's one approach

> set.seed(1)
> sapply(split(DIN1998$DIN, DIN1998$BayStation), function(x){
    mean(sample(x, 3))
  })
     60069      60081 
0.00900000 0.01666667 

If your data.frame is too large, then you may want to use data.table

> library(data.table)
> dt <- data.table(DIN1998)
> set.seed(1)
> dt[,list(Mean=mean(sample(DIN, 3))), by="BayStation"]
   BayStation       Mean
1:      60069 0.00900000
2:      60081 0.01666667

Another R Base solution

> set.seed(1)
> cbind(Mean.by.BayStation=with(DIN1998, 
                                by(DIN, BayStation, function(x)  
                                  mean(sample(x, 3)))))
      Mean.by.BayStation
60069         0.00900000
60081         0.01666667

Upvotes: 0

rrs
rrs

Reputation: 9903

If you want to use plyr

ddply(DIN1998, .(BayStation), 
      summarise, 
      sample.mean=mean(sample(DIN, 3, replace=FALSE, prob=NULL)))

with set.seed(1) you get

  BayStation sample.mean
1      60069  0.00900000
2      60081  0.0166666

Upvotes: 1

Sven Hohenstein
Sven Hohenstein

Reputation: 81693

You can use tapply:

with(DIN1998, tapply(DIN, BayStation, function(x) mean(sample(x), 3)))
#  60069  60081 
# 0.0095 0.0140 

or aggregate:

aggregate(DIN ~ BayStation, DIN1998, function(x) mean(sample(x), 3))
#   BayStation    DIN
# 1      60069 0.0095
# 2      60081 0.0140

Upvotes: 0

Related Questions