sop
sop

Reputation: 3625

How to get x rows from each category in R?

I have a matrix that contains many rows, let's say more than 5000 rows from each category and I would like to get 4500 rows from each category. How to do it in R?

I know that there is unique, but this is getting just one element per category, but I need N elements per category.


Here is my data:

    cat    f1    f2    f3
1   a      15    20    sdr
2   b      8     6     zrf
3   a      54    6     sf
4   c      32    8     azr
5   b      65    98    arfg
....

Upvotes: 0

Views: 71

Answers (1)

cdeterman
cdeterman

Reputation: 19960

One 'brute-force' kind of approach would be to split your data by group and then simply take the head of N rows. Then simply bind them all together for your new data.frame. This is the essence of 'split-apply-combine'.

df <- data.frame(group=rep(c("A","B"), each=10), var=rnorm(20))

# Number of Rows
N <- 5
# the split, apply(i.e. head), combine approach
do.call("rbind", lapply(split(df, f=df$group), function(x) head(x, n=N)))

The same approach will work if you data is in a matrix with a column containing some sort of unique group identifier and you call split.data.frame directly. It will still split your matrix in to a list of 'sub' matrices.

mat <- matrix(c(rep(c(0,1), each=10), rnorm(20)),20,2)

do.call("rbind", lapply(split.data.frame(mat, f=mat[,1]), function(x) head(x, n=N)))

EDIT

As suggested by @akrun below, you could also use dplyr if your object is a data.frame

library(dplyr)
df %>% 
    group_by(group) %>% 
    slice(seq(N))

Upvotes: 4

Related Questions