Reputation: 215
I have followed this example Remove last N rows in data frame with the arbitrary number of rows but it just deletes only the last 50 rows of the data frame rather than the last 50 rows of every study site within the data frame. I have a really big data set that has multiple study sites and within each study site there's multiple depths and for each depth, a concentration of nutrients.
I want to just delete the last 50 rows of depth for each station.
E.g. station 1 has 250 depths station 2 has 1000 depths station 3 has 150 depth
but keep all the other data consistent.
This just seems to remove the last 50 from the dataframe rather than the last 50 from every station...
df<- df[-seq(nrow(df),nrow(df)-50),]
What should I do to add more variables (study site) to filter by?
Upvotes: 1
Views: 167
Reputation: 615
we can use slice
function from dplyr
package
df2<-df %>% group_by(Col1) %>% slice(1:(n()-4))
At first it groups by category column and if arranged in proper order it can remove last n number of rows (in this case 4) from dataframe for each category.
Upvotes: 1
Reputation: 5456
A potential base R solution would be:
d <- data.frame(station = rep(paste("station", 1:3), c(250, 1000, 150)),
depth = rnorm(250 + 1000 + 150, 100, 10))
d$grp_counter <- do.call("c", lapply(tapply(d$depth, d$station, length), seq_len))
d$grp_length <- rep(tapply(d$depth, d$station, length), tapply(d$depth, d$station, length))
d <- d[d$grp_counter <= (d$grp_length - 50),]
d
# OR w/o auxiliary vars: subset(d, select = -c(grp_counter, grp_length))
Upvotes: 2