Reputation: 81
I have an R dataframe with two levels of data: id
and year
. Within groups defined by id
, the years increase (entire dataset has the same (number of) years per group, like so:
id year var1 var2
11A 2001 ... ...
11A 2002 ... ...
11A 2003 ... ...
11A 2004 ... ...
13B 2001 ... ...
13B 2002 ... ...
13B 2003 ... ...
13B 2004 ... ...
22Z 2001 ... ...
I have about 20.000 groups in my data, of couse way too many to make nice plots of growth curves. How do I randomly select about 20 of my id's? (so: also select all 4 rows of years corresponding to that id?)
Upvotes: 2
Views: 4832
Reputation: 6290
subset(df, id %in% sample(levels(df$id), 20))
that's assuming your data frame is called df
and that your id
is a factor (use unique
instead of levels
if it's not)
Upvotes: 2
Reputation: 109844
This is pretty straight forward if you use sample
and then index. Here's a made up example that looks similar to what you've presented. It's really only two lines of code and could be done in one if you wanted.
dat <- data.frame(id=paste0(LETTERS[1:8], rep(1:1250, 8)),
year=as.factor(as.character(sample(c(1990:2012, 20000, T)))),
var1=rnorm(20000), var2=rnorm(20000))
#a look at the data
head(dat)
#sample 20 id's randomly
(ids <- sample(unique(dat$id), 20))
#narrow your data set
dat2 <- dat[dat$id %in% ids, ]
Upvotes: 5