Kaizen
Kaizen

Reputation: 151

How to sample rows from a dataframe using two categorical variables?

I have a data frame with two categorical variables.

samples<-c("A","A","A","A","B","B","B","C","C","C")
groups<-c(1,1,2,3,1,1,1,2,2,2)
df<- data.frame(samples,groups)
df
   samples groups
1        A      1
2        A      1
3        A      2
4        A      3
5        B      1
6        B      1
7        B      1
8        C      2
9        C      2
10       C      2

The result that I would like to have is for each given group in groups to downsample the data frame to a maximum of x rows for a given sample. In the example here X=2. Is there an easy way to do this?

enter image description here

Upvotes: 0

Views: 99

Answers (1)

daniellga
daniellga

Reputation: 1224

See if this is what you want:

library(data.table)
setDT(df)

x <- 2
df[, index := seq_len(.N), by = .(samples, groups)]

df <- copy(df[index <= x][,index := NULL])

df

   samples groups
1:       A      1
2:       A      1
3:       A      2
4:       A      3
5:       B      1
6:       B      1
7:       C      2
8:       C      2

Upvotes: 1

Related Questions