Reputation: 323
I am trying to delete all observations (rows) after a maximum value is reached when using group_by
. In other words, I want to select only the observations equal to or less than a maximum in each group. I have a dataset similar to the below in which I want to group_by(Group1, Group2)
UID<-1:30
Group1<-c(rep("jedi", 10), rep("sith", 10), rep("powerless", 10))
Group2<-c(rep("blue", 5), rep("green", 5), rep("purple", 5), rep("red", 5), rep("yellow", 5), rep("orange", 5))
Value<- c(1,2,3,3,3,0,0,1,1,1,1,1,1,2,3,0,2,4,4,4,2,3,3,5,5,0,3,5,5,5)
df<-data.frame(UID, Group1, Group2, Value)
My expected output would look like this;
UID.Expected<-c(1,2,3,6,7,8,11,12,13,14,15,16,17,18,21,22,23,25,26,27,28)
df.expected <- df %>% filter(UID %in% UID.Expected)
I have tried top_n
, slice_min
/max
, select
, and filter
but can't figure it out. I know it something super simple but can't quite get there. Assistance is greatly appreciated.
Upvotes: 1
Views: 164
Reputation: 886978
We can use match
to get the index of the first 'max' 'Value' per group, get the seq
uence and use that in slice
library(dplyr)
df %>%
group_by(Group1, Group2) %>%
slice(seq(match(max(Value), Value))) %>%
ungroup %>%
arrange(UID) %>%
as.data.frame
-output
# UID Group1 Group2 Value
#1 1 jedi blue 1
#2 2 jedi blue 2
#3 3 jedi blue 3
#4 6 jedi green 0
#5 7 jedi green 0
#6 8 jedi green 1
#7 11 sith purple 1
#8 12 sith purple 1
#9 13 sith purple 1
#10 14 sith purple 2
#11 15 sith purple 3
#12 16 sith red 0
#13 17 sith red 2
#14 18 sith red 4
#15 21 powerless yellow 2
#16 22 powerless yellow 3
#17 23 powerless yellow 3
#18 24 powerless yellow 5
#19 26 powerless orange 0
#20 27 powerless orange 3
#21 28 powerless orange 5
Upvotes: 1