Reputation: 151
I have a data frame with two categorical variables.
samples<-c("A","A","A","A","B","B","B","C","C","C")
groups<-c(1,1,2,3,1,1,1,2,2,2)
df<- data.frame(samples,groups)
df
samples groups
1 A 1
2 A 1
3 A 2
4 A 3
5 B 1
6 B 1
7 B 1
8 C 2
9 C 2
10 C 2
The result that I would like to have is for each given group in groups
to downsample the data frame to a maximum of x rows for a given sample. In the example here X=2.
Is there an easy way to do this?
Upvotes: 0
Views: 99
Reputation: 1224
See if this is what you want:
library(data.table)
setDT(df)
x <- 2
df[, index := seq_len(.N), by = .(samples, groups)]
df <- copy(df[index <= x][,index := NULL])
df
samples groups
1: A 1
2: A 1
3: A 2
4: A 3
5: B 1
6: B 1
7: C 2
8: C 2
Upvotes: 1