Reputation: 25608

Computing total rank for multiple columns using NSE syntax

My goal is to write a function take_by_rank that

can operate on arbitrary selection of numeric columns within a data frame;
uses non-standard evaluation like base::subset or dplyr verbs;
understands the minus sign naturally, so that -foo means "the largest value of foo gets the lowest rank";
returns n top or bottom rows by the total rank, which is the sum of ranks computed for each of the selected variables.

I'm interested in both learning the newest dplyr way and looking for alternative approaches, i.e. there is no restriction on package selection (pure base or data.table maybe?).

My current solution is

library(data.table)
library(dplyr)
library(rlang)

take_by_rank <- function(df, ..., n = 100) {
    selected_vars <- quos(...)
    if (!length(selected_vars))
        stop("No variables to rank!")
    prefix <- ".rank_"
    for (i in seq_along(selected_vars)) {
        rank_name <- paste0(prefix, quo_name(selected_vars[[i]]))
        df <- df %>%
            mutate(!!rank_name := frankv(!!selected_vars[[i]]))
    }
    df %>%
        mutate(TotalRank = rowSums(select(df, starts_with(prefix)))) %>%
        arrange(TotalRank) %>%
        top_n(n, -TotalRank)
}

It seems to be okay, but maybe I'm missing something more straightforward. If there's a way to replace the for loop, that would also be nice.

Usage examples (for reference)

take_by_rank(mtcars, mpg, qsec, n = 3)
   mpg cyl disp  hp drat   wt  qsec vs am gear carb .rank_mpg .rank_qsec TotalRank
1 13.3   8  350 245 3.73 3.84 15.41  0  0    3    4         3          3         6
2 15.0   8  301 335 3.54 3.57 14.60  0  1    5    8         6          2         8
3 14.3   8  360 245 3.21 3.57 15.84  0  0    3    4         4          5         9

take_by_rank(mtcars, mpg, qsec, n = -3)
   mpg cyl  disp hp drat    wt  qsec vs am gear carb .rank_mpg .rank_qsec TotalRank
1 22.8   4 140.8 95 3.92 3.150 22.90  1  0    4    2      24.5         32      56.5
2 32.4   4  78.7 66 4.08 2.200 19.47  1  1    4    1      31.0         27      58.0
3 33.9   4  71.1 65 4.22 1.835 19.90  1  1    4    1      32.0         28      60.0

take_by_rank(mtcars, mpg, -qsec, n = 3)
   mpg cyl disp  hp drat    wt  qsec vs am gear carb .rank_mpg .rank_-qsec TotalRank
1 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1      14.0           2      16.0
2 10.4   8  472 205 2.93 5.250 17.98  0  0    3    4       1.5          15      16.5
3 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4       1.5          16      17.5

Upvotes: 1

Answers (3)

GGamba

Reputation: 13680

As you noted the use of mutate_at make it impossible (or very hard) to work with the -foo behavior.

I propose you this solution. It's not extremely different from what you did.
I changed the for-loop with purrr::map, and streamlined the creation of total_rank.

library(tidyverse)
# ....
library(rlang)
# ....

take_by_rank <- function(df, ..., n = 100) {
    # original quosures
    selected_vars <- quos(...)

    if (!length(selected_vars))
        stop("No variables to rank!")

    suffixed_vars <- map(selected_vars, ~ {
        paste0(quo_name(.x), '_rank') %>%
            as.name() %>%
            as_quosure()
    })

    selected_vars %>%
        map( ~ {
            rank_name <- paste0(quo_name(.x), '_rank')
            df %>%                   # or whatever rank function you want
                mutate(!!rank_name := dense_rank(!!.x))
        }) %>%
        reduce(full_join) %>%
        mutate(total_rank = '+'(!!!suffixed_vars)) %>% # !!! = unquote and splice
        top_n(n, -total_rank)

}

take_by_rank(mtcars, mpg, qsec, n = 3)
#> Joining, by = c("mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb")
#>    mpg cyl disp  hp drat   wt  qsec vs am gear carb mpg_rank qsec_rank
#> 1 14.3   8  360 245 3.21 3.57 15.84  0  0    3    4        3         5
#> 2 13.3   8  350 245 3.73 3.84 15.41  0  0    3    4        2         3
#> 3 15.0   8  301 335 3.54 3.57 14.60  0  1    5    8        5         2
#>   total_rank
#> 1          8
#> 2          5
#> 3          7

take_by_rank(mtcars, mpg, qsec, n = -3)
#> Joining, by = c("mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb")
#>    mpg cyl  disp hp drat    wt  qsec vs am gear carb mpg_rank qsec_rank
#> 1 22.8   4 140.8 95 3.92 3.150 22.90  1  0    4    2       19        30
#> 2 32.4   4  78.7 66 4.08 2.200 19.47  1  1    4    1       24        25
#> 3 33.9   4  71.1 65 4.22 1.835 19.90  1  1    4    1       25        26
#>   total_rank
#> 1         49
#> 2         49
#> 3         51

take_by_rank(mtcars, mpg, -qsec, n = 3)
#> Joining, by = c("mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb")
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb mpg_rank -qsec_rank
#> 1 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1       12          2
#> 2 10.4   8  472 205 2.93 5.250 17.98  0  0    3    4        1         14
#> 3 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4        1         15
#>   total_rank
#> 1         14
#> 2         15
#> 3         16

take_by_rank(mtcars, mpg,  n = 3)
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb mpg_rank total_rank
#> 1 10.4   8  472 205 2.93 5.250 17.98  0  0    3    4        1          1
#> 2 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4        1          1
#> 3 13.3   8  350 245 3.73 3.840 15.41  0  0    3    4        2          2

Upvotes: 0

Lionel Henry

Reputation: 6803

You can pass the dots to vars() before passing them to mutate_at()

mutate_at(df, vars(...), myfuns)

This is equivalent to passing the dots to tidyselect::vars_select() and then to mutate_at():

vars <- tidyselect::vars_select(tbl_vars(df), ...)
mutate_at(df, vars, myfuns)

Upvotes: 2

Fredrik Isaksson

Reputation: 101

As Alex P also suggest you can use mutate_at() to remove the for loop, we can then rewrite the function as:

take_by_rank <- function(df, ..., n = 100) {
  selected_vars <- quos(...)
  if (!length(selected_vars))
    stop("No variables to rank!")
  df <- df %>% 
    mutate_at(selected_vars,  funs(rank = frankv)) %>%  
    mutate(TotalRank = rowSums(select(., ends_with("_rank")))) %>%
    arrange(TotalRank) %>%
    top_n(n, -TotalRank)
}

This will apply frankv to all selected vars and add new columns with the suffix _rank. I also changed the select statement to reference the piped data.frame. If you want to have complete variable name matching for the Totalrank calculation this will work.

 take_by_rank_matching <- function(df, ..., n = 100) {
      selected_vars <- quos(...)
      if (!length(selected_vars))
        stop("No variables to rank!")
      df <- df %>% 
        mutate_at(selected_vars,  funs(rank = frankv)) %>%  
        mutate(TotalRank = rowSums(
          select_at(., unlist(lapply(selected_vars,
                                     function(x)
                                       paste0(quo_label(x), "_rank")))))) %>%
        arrange(TotalRank) %>%
        top_n(n, -TotalRank)
    }

Although I think there might be a cleaner way.

Upvotes: 1

Computing total rank for multiple columns using NSE syntax

Answers (3)

Related Questions