Reputation: 3
I have a (probably simple) question that I can't figure out.
I'd like to write a loop (or use mapply or ddply?) to randomly sample three values from each of multiple subsets of data, and find the mean value for that random sample and put it in a dataframe.
For example, here is a small portion of the data:
BayStation DIN Year
1 60069 0.0090 1998
2 60069 0.0060 1998
3 60069 0.0100 1998
4 60069 0.0020 1998
5 60069 0.0140 1998
6 60069 0.0110 1998
7 60081 0.0140 1998
8 60081 0.0140 1998
9 60081 0.0060 1998
10 60081 0.0020 1998
11 60081 0.0250 1998
12 60081 0.0140 1998
13 60081 0.0110 1998
I want to subset by BayStation, randomly sample three DIN values for each BayStation, and calculate the mean. I know how to do this for one bay station:
test<-mean(sample(DIN1998$DIN[DIN1998$BayStation=="60081"],
3, replace = FALSE, prob = NULL))
But I'd like to know how I could do this for an entire dataframe, with hundreds of stations. Can anyone tell me how to do this? Or give a big hint? Safe to say, my R skills are very basic- thanks in advance!
Upvotes: 0
Views: 561
Reputation: 61154
Here's one approach
> set.seed(1)
> sapply(split(DIN1998$DIN, DIN1998$BayStation), function(x){
mean(sample(x, 3))
})
60069 60081
0.00900000 0.01666667
If your data.frame
is too large, then you may want to use data.table
> library(data.table)
> dt <- data.table(DIN1998)
> set.seed(1)
> dt[,list(Mean=mean(sample(DIN, 3))), by="BayStation"]
BayStation Mean
1: 60069 0.00900000
2: 60081 0.01666667
Another R Base solution
> set.seed(1)
> cbind(Mean.by.BayStation=with(DIN1998,
by(DIN, BayStation, function(x)
mean(sample(x, 3)))))
Mean.by.BayStation
60069 0.00900000
60081 0.01666667
Upvotes: 0
Reputation: 9903
If you want to use plyr
ddply(DIN1998, .(BayStation),
summarise,
sample.mean=mean(sample(DIN, 3, replace=FALSE, prob=NULL)))
with set.seed(1)
you get
BayStation sample.mean
1 60069 0.00900000
2 60081 0.0166666
Upvotes: 1
Reputation: 81693
You can use tapply
:
with(DIN1998, tapply(DIN, BayStation, function(x) mean(sample(x), 3)))
# 60069 60081
# 0.0095 0.0140
or aggregate
:
aggregate(DIN ~ BayStation, DIN1998, function(x) mean(sample(x), 3))
# BayStation DIN
# 1 60069 0.0095
# 2 60081 0.0140
Upvotes: 0