user1780218
user1780218

Reputation: 81

Randomly select groups (and all cases per group) in R?

I have an R dataframe with two levels of data: id and year. Within groups defined by id, the years increase (entire dataset has the same (number of) years per group, like so:

id    year    var1    var2
11A   2001    ...     ...
11A   2002    ...     ...
11A   2003    ...     ...
11A   2004    ...     ...
13B   2001    ...     ...
13B   2002    ...     ...
13B   2003    ...     ...
13B   2004    ...     ...
22Z   2001    ...     ...

I have about 20.000 groups in my data, of couse way too many to make nice plots of growth curves. How do I randomly select about 20 of my id's? (so: also select all 4 rows of years corresponding to that id?)

Upvotes: 2

Views: 4832

Answers (2)

seancarmody
seancarmody

Reputation: 6290

subset(df, id %in% sample(levels(df$id), 20))

that's assuming your data frame is called df and that your id is a factor (use unique instead of levels if it's not)

Upvotes: 2

Tyler Rinker
Tyler Rinker

Reputation: 109844

This is pretty straight forward if you use sample and then index. Here's a made up example that looks similar to what you've presented. It's really only two lines of code and could be done in one if you wanted.

dat <- data.frame(id=paste0(LETTERS[1:8], rep(1:1250, 8)), 
   year=as.factor(as.character(sample(c(1990:2012, 20000, T)))), 
   var1=rnorm(20000), var2=rnorm(20000))

#a look at the data
head(dat)

#sample 20 id's randomly
(ids <- sample(unique(dat$id), 20))

#narrow your data set
dat2 <- dat[dat$id %in% ids, ]

Upvotes: 5

Related Questions