Reputation: 3625
I have a matrix that contains many rows, let's say more than 5000 rows from each category and I would like to get 4500 rows from each category. How to do it in R?
I know that there is unique, but this is getting just one element per category, but I need N elements per category.
Here is my data:
cat f1 f2 f3
1 a 15 20 sdr
2 b 8 6 zrf
3 a 54 6 sf
4 c 32 8 azr
5 b 65 98 arfg
....
Upvotes: 0
Views: 71
Reputation: 19960
One 'brute-force' kind of approach would be to split your data by group and then simply take the head
of N
rows. Then simply bind them all together for your new data.frame
. This is the essence of 'split-apply-combine'.
df <- data.frame(group=rep(c("A","B"), each=10), var=rnorm(20))
# Number of Rows
N <- 5
# the split, apply(i.e. head), combine approach
do.call("rbind", lapply(split(df, f=df$group), function(x) head(x, n=N)))
The same approach will work if you data is in a matrix
with a column containing some sort of unique group identifier and you call split.data.frame
directly. It will still split your matrix in to a list of 'sub' matrices.
mat <- matrix(c(rep(c(0,1), each=10), rnorm(20)),20,2)
do.call("rbind", lapply(split.data.frame(mat, f=mat[,1]), function(x) head(x, n=N)))
EDIT
As suggested by @akrun below, you could also use dplyr
if your object is a data.frame
library(dplyr)
df %>%
group_by(group) %>%
slice(seq(N))
Upvotes: 4