L55
L55

Reputation: 215

Deleting a subset of rows based on other variables

I have followed this example Remove last N rows in data frame with the arbitrary number of rows but it just deletes only the last 50 rows of the data frame rather than the last 50 rows of every study site within the data frame. I have a really big data set that has multiple study sites and within each study site there's multiple depths and for each depth, a concentration of nutrients.

I want to just delete the last 50 rows of depth for each station.

E.g. station 1 has 250 depths station 2 has 1000 depths station 3 has 150 depth

but keep all the other data consistent.

This just seems to remove the last 50 from the dataframe rather than the last 50 from every station...

 df<- df[-seq(nrow(df),nrow(df)-50),]

What should I do to add more variables (study site) to filter by?

Upvotes: 1

Views: 167

Answers (2)

Harshal Gajare
Harshal Gajare

Reputation: 615

we can use slice function from dplyr package

df2<-df %>% group_by(Col1) %>% slice(1:(n()-4))

At first it groups by category column and if arranged in proper order it can remove last n number of rows (in this case 4) from dataframe for each category.

Upvotes: 1

r.user.05apr
r.user.05apr

Reputation: 5456

A potential base R solution would be:

d <- data.frame(station = rep(paste("station", 1:3), c(250, 1000, 150)),
                depth = rnorm(250 + 1000 + 150, 100, 10))

d$grp_counter <- do.call("c", lapply(tapply(d$depth, d$station, length), seq_len))
d$grp_length <- rep(tapply(d$depth, d$station, length), tapply(d$depth, d$station, length))
d <- d[d$grp_counter <= (d$grp_length - 50),]
d

# OR w/o auxiliary vars: subset(d, select = -c(grp_counter, grp_length))

Upvotes: 2

Related Questions