Reputation: 119
I am trying to sample a subset from data with replacement and here I show a simple example as follows:
dat <- data.frame (
group = c(1,1,2,2,2,3,3,4,4,4,4,5,5),
var = c(0.1,0.0,0.3,0.4,0.8,0.5,0.2,0.3,0.7,0.9,0.2,0.4,0.6)
)
I just want to sample a subset based on the group numbers. If the group, e.g., group = 1, is selected, the whole group (two group members in my simple example above) will be selected. If the group was selected more than one times, the group number will be changed as a new group, e.g., 1.1, 1.1, 1.2, 1.2, …. The new data may look like this:
newdat <- data.frame (
group = c(3,3,5,5,3.1,3.1,1,1,3.2,3.2,5.1,5.1,3.3,3.3,2,2,2),
var = c(0.5,0.2,0.4,0.6,0.5,0.2,0.1,0.0,0.5,0.2,0.4,0.6,0.5,0.2,0.3,0.4,0.8)
)
Any help would be greatly appreciated.
Upvotes: 2
Views: 239
Reputation: 162411
Here's a fairly simple solution that uses make.unique()
to create the names of the groups in newdat
:
## Your data
dat <- data.frame (
group = c(1,1,2,2,2,3,3,4,4,4,4,5,5),
var = c(0.1,0.0,0.3,0.4,0.8,0.5,0.2,0.3,0.7,0.9,0.2,0.4,0.6)
)
n <- c(3,5,3,1,3,2,5,3,2)
## Make a 'look-up' data frame that associates sampled groups with new names,
## then use merge to create `newdat`
df <- data.frame(group = n,
newgroup = as.numeric(make.unique(as.character(n))))
newdat <- merge(df, dat)[-1]
names(newdat)[1] <- "group"
Upvotes: 3
Reputation: 72769
Pick your n
however you prefer:
n <- 5
Then run this (or make a function out of it):
lvls <- unique(dat$group)
gp.orig <- gp.samp <- sample( lvls, n, replace=TRUE ) #this is the actual sampling
library(taRifx)
res <- stack.list(lapply( gp.samp, function(i) dat[dat$group==i,] ))
# Now make your pretty group names
while(any(duplicated(gp.samp))) {
gp.samp[duplicated(gp.samp)] <- gp.samp[duplicated(gp.samp)] + .1
}
# Replace group with pretty group names (a simple merge doesn't work here because the groups are not unique)
gp.df <- as.data.frame(table(dat$group))
names(gp.df) <- c("group","n")
gp.samp.df <- merge(data.frame(group=gp.orig,pretty=gp.samp,order=seq(length(gp.orig))), gp.df )
gp.samp.df <- sort(gp.samp.df, f=~order)
res$pretty <- with( gp.samp.df, rep(pretty,n))
group var pretty
6 3 0.5 3.0
7 3 0.2 3.0
12 5 0.4 5.0
13 5 0.6 5.0
61 3 0.5 3.1
71 3 0.2 3.1
62 3 0.5 3.2
72 3 0.2 3.2
3 2 0.3 2.0
4 2 0.4 2.0
5 2 0.8 2.0
Should be pretty general. If you want more than 10 groups, you'll have to use text-based methods to calculate the "pretty" version, as this will wrap over since it's numerically-based. E.g. the 11th group 3 will be calculated as 3+10*.1=4
!
Upvotes: 2