Reputation: 3441
With the dat
below. How can I make a new dataframe subset that includes all values except the first five rows for each IndID? Said differently I want new data frame with the first 5 rows for each IndID excluded.
set.seed(123)
dat <- data.frame(IndID = rep(c("AAA", "BBB", "CCC", "DDD"), each = 10),
Number = sample(1:100,40))
I have seen a number of SO posts that select data, but I am not sure how to remove as mentioned above.
Upvotes: 5
Views: 10990
Reputation: 66819
If the data is sorted and you are guaranteed to have at least n
rows per group...
n = 5
w = match(unique(dat$IndID), dat$IndID)
dat[- (rep(w, each = n) + 1:n - 1L), ]
Upvotes: 6
Reputation: 99331
In base R, tapply()
is handy when used on a sequence of row numbers with tail()
.
idx <- unlist(tapply(1:nrow(dat), dat$IndID, tail, -5))
dat[idx, ]
Note that this will be more efficient with use.names=FALSE
in unlist()
.
With data.table, you can do the following with tail()
.
library(data.table)
setDT(dat)[dat[, tail(.I, -5), by=IndID]$V1]
Upvotes: 7
Reputation: 32548
You can use split
of base R
to split dat
by IndID
, remove first 5 rows of each sub-group, and then rbind
it after that.
do.call(rbind, lapply(split(dat,as.character(dat$IndID)), function(x) x[-(1:5),]))
Upvotes: 3
Reputation: 13680
We can use dplyr
's slice()
functionality:
dat %>%
group_by(IndID) %>%
slice(6:n())
Upvotes: 23