cebola
cebola

Reputation: 251

How do I delete the last row of a group in r based on conditions in a different row in r

I have a data frame like this (NT and MCS are text placeholders for a certain species let's say)

A <- c("NT", "MCS","MCS","NT", "MCS", "MCS", "NT", "MCS", "MCS", "MCS", 
       "NT", "MCS", "MCS","NT", "MCS","NT","NT","MCS", "MCS","NT")
B <- c("1", "3", "3","3","3", "3","3","4","4","4","4","3", 
       "3","3","2","2","1","3","3","3")
C <- c("G1", "G2", "G2", "G2", "G3", "G3", "G3", "G4", "G4", "G4", "G4", 
       "G5", "G5", "G5","G6", "G6", "G7","G8","G8","G8")

df <- data.frame(A,B,C)

A   B   C
NT  1   G1
MCS 3   G2
MCS 3   G2
NT  3   G2
MCS 3   G3
MCS 3   G3
NT  3   G3
MCS 4   G4
MCS 4   G4
MCS 4   G4
NT  4   G4
MCS 3   G5
MCS 3   G5
NT  3   G5
MCS 2   G6
NT  2   G6
NT  1   G7
MCS 3   G8
MCS 3   G8
NT  3   G8

The A column represents the species. The B column represents an integer value that equals the the number of rows in each group. C column represents unique groups. The criteria is as follows: Delete the LAST row of every group/species IF B > 1. If B = 1, then NT (or the only row in that group) should remain. Here is what I need it to look like.

A   B   C
NT  1   G1
MCS 3   G2
MCS 3   G2
MCS 3   G3
MCS 3   G3
MCS 4   G4
MCS 4   G4
MCS 4   G4
MCS 3   G5
MCS 3   G5
MCS 2   G6
NT  1   G7
MCS 3   G8
MCS 3   G8

new<- df %>% group_by(A, B) %>% slice(if(any(numb > 1)) 1:n())

The above is the closest code I've run but it doesn't evaluate to an integer or numeric vector (which is what I need it to do). I did it like this:

new <- df %>% group_by(A, B) %>% 
              slice(if(any(B > 1)) 1 else 1:n())

but it got rid of the repeated values (so all the ones in column A I did not want to delete - not the last row regardless of repetition). Is there something I'm missing in the code I've run or another method that would accomplish this (ideally in dplyr but I'd be curious about all methods)?

Upvotes: 0

Views: 2399

Answers (2)

akrun
akrun

Reputation: 887048

With dplyr, we can also do

library(dplyr)
df %>%
     group_by(C) %>%
     slice(union(1, head(seq_len(n()), -1)))

Or with filter

df %>%
   group_by(C) %>%
   filter(row_number() != n() | n()==1)

Upvotes: 1

eddi
eddi

Reputation: 49448

df %>% group_by(C) %>% slice(if(n() > 1) 1:(n()-1) else 1)

or

library(data.table)
setDT(df)

df[, if (.N > 1) head(.SD, -1) else .SD, by = C]

or for max speed (and also incidentally keeping column order)

df[df[, if (.N > 1) head(.I, -1) else .I, by = C]$V1]

Upvotes: 3

Related Questions