Reputation: 5897
I have this problem
library(dplyr)
problem = data.frame(id = c(1,1,1,2,2,2), var1 = c(5,4,3, 6,5,4), var2 = c(99,12,32,88,9,8))
For each id, I want to only keep row with second largest value of var1. I tried different ways (dplyr, base R):
problem %>%
group_by(id) %>%
slice_tail(2, -var1)
problem[with(problem, ave(var1, id, FUN = function(x) x == tail(sort(x), 2)[1])), ]
First code doesn;t work, second code gives wrong answer.
What am I doing wrong?
Upvotes: 2
Views: 142
Reputation: 886938
After arrange
ing the 'var1' on desc
ending use slice
with 2
library(dplyr)
problem %>%
arrange(id, desc(var1)) %>%
group_by(id) %>%
slice(2) %>%
ungroup
-output
# A tibble: 2 × 3
id var1 var2
<dbl> <dbl> <dbl>
1 1 4 12
2 2 5 9
Upvotes: 0
Reputation: 1481
In case you have volume, here is a data.table
approach.
problem = data.frame(id = c(1,1,1,2,2,2), var1 = c(5,4,3, 6,5,4), var2 = c(99,12,32,88,9,8))
setDT(problem)
setorder(problem, id, - var1)
problem[, .SD[2], by=id]
As for @paul Stafford Allen comment, you will have issue for groups of size only 1.
Upvotes: 1
Reputation: 19097
If you wish to use slice
, I guess you can first slice_max()
the largest two rows, than slice_tail
to remove the largest row.
library(dplyr)
problem %>%
group_by(id) %>%
slice_max(var1, n = 2) %>%
slice_tail(n = 1)
Or you can use a single filter
:
problem %>% group_by(id) %>% filter(var1 == max(var1[var1 != max(var1)]))
# A tibble: 2 × 3
# Groups: id [2]
id var1 var2
<dbl> <dbl> <dbl>
1 1 4 12
2 2 5 9
Upvotes: 1
Reputation: 3071
problem |> group_by(id) %>% arrange(var1) %>% slice(n()-1)
n() counts the number of rows in each group. slice(n()-1) takes the n-1th element. Note this will cause issues with groups with fewer than 2 members - you may wish to allow for that.
Upvotes: 1