B. Davis
B. Davis

Reputation: 3441

Remove the first N rows from each factor level in an r data.frame

With the dat below. How can I make a new dataframe subset that includes all values except the first five rows for each IndID? Said differently I want new data frame with the first 5 rows for each IndID excluded.

set.seed(123)
dat <- data.frame(IndID = rep(c("AAA", "BBB", "CCC", "DDD"), each  = 10),
                  Number = sample(1:100,40))

I have seen a number of SO posts that select data, but I am not sure how to remove as mentioned above.

Upvotes: 5

Views: 10990

Answers (4)

Frank
Frank

Reputation: 66819

If the data is sorted and you are guaranteed to have at least n rows per group...

n = 5
w = match(unique(dat$IndID), dat$IndID)
dat[- (rep(w, each = n) + 1:n - 1L), ]

Upvotes: 6

Rich Scriven
Rich Scriven

Reputation: 99331

In base R, tapply() is handy when used on a sequence of row numbers with tail().

idx <- unlist(tapply(1:nrow(dat), dat$IndID, tail, -5))
dat[idx, ]

Note that this will be more efficient with use.names=FALSE in unlist().

With data.table, you can do the following with tail().

library(data.table)

setDT(dat)[dat[, tail(.I, -5), by=IndID]$V1]

Upvotes: 7

d.b
d.b

Reputation: 32548

You can use split of base R to split dat by IndID, remove first 5 rows of each sub-group, and then rbind it after that.

do.call(rbind, lapply(split(dat,as.character(dat$IndID)), function(x) x[-(1:5),]))

Upvotes: 3

GGamba
GGamba

Reputation: 13680

We can use dplyr's slice() functionality:

dat %>% 
    group_by(IndID) %>% 
    slice(6:n())

Upvotes: 23

Related Questions