Reputation: 983
I would like to create a new column based on the n highest values per row of a data frame.
Take the following example:
library(tibble)
df <- tribble(~name, ~q_1, ~q_2, ~q_3, ~sum_top_2,
"a", 4, 1, 5, 9,
"b", 2, 8, 9, 17)
Here, the sum_top_2 column sums the 2 highest values of columns prefixed with "q_". I would like to generalize to the n highest values by row. How can I do this using dplyr
without reshaping?
Upvotes: 6
Views: 343
Reputation: 887078
One option is pmap
from purrr
to loop over the rows of the columns that starts_with
'q_', by sort
ing the row in decreasing
order, get the first 'n' sorted elements with head
and sum
library(dplyr)
library(purrr)
library(stringr)
n <- 2
df %>%
mutate(!! str_c("sum_top_", n) := pmap_dbl(select(cur_data(),
starts_with('q_')),
~ sum(head(sort(c(...), decreasing = TRUE), n))))
-output
# A tibble: 2 x 5
name q_1 q_2 q_3 sum_top_2
<chr> <dbl> <dbl> <dbl> <dbl>
1 a 4 1 5 9
2 b 2 8 9 17
Or use rowwise
from dplyr
.
df %>%
rowwise %>%
mutate(!! str_c("sum_top_", n) := sum(head(sort(c_across(starts_with("q_")),
decreasing = TRUE), n))) %>%
ungroup
# A tibble: 2 x 5
name q_1 q_2 q_3 sum_top_2
<chr> <dbl> <dbl> <dbl> <dbl>
1 a 4 1 5 9
2 b 2 8 9 17
Upvotes: 4