J.Sabree
J.Sabree

Reputation: 2536

How to use across and mutate across an entire dataset that has multiple column types?

I'm trying to use dplyr's across and case_when across my entire dataset, so whenever it sees "Strongly Agree" it changes it to a numeric 5, "Agree" to a numeric 4, and so on. I've tried looking at this answer, but I'm getting an error because my dataset has logical and numeric columns and R rightfully says that "Agree" can't be in a logical column, etc.

Here's my data:

library(dplyr)
test <- tibble(name = c("Justin", "Corey", "Sibley"),
               date = c("2021-08-09", "2021-10-29", "2021-01-01"),
               s1 = c("Agree", "Neutral", "Strongly Disagree"),
               s2rl = c("Agree", "Neutral", "Strongly Disagree"),
               f1 = c("Strongly Agree", "Disagree", "Strongly Disagree"),
               f2rl = c("Strongly Agree", "Disagree", "Strongly Disagree"),
               exam = c(90, 99, 100),
               early = c(TRUE, FALSE, FALSE))

Ideally, I'd like one command that would allow me to go across the entire dataset. However, if that can't be done, I'd like to have one argument that would allow me to use multiple across(contains()) arguments (i.e., here contains "s" or "f").

Here's what I've tried already to no avail:

library(dplyr)
test %>%
  mutate(across(.), 
         ~ case_when(. == "Strongly Agree" ~ 5, 
                     . == "Agree" ~ 4,
                     . == "Neutral" ~ 3,
                     . == "Disagree" ~ 2,
                     . == "Strongly Disagree" ~ 1,
                     TRUE ~ NA))

Error: Problem with `mutate()` input `..1`.
x Must subset columns with a valid subscript vector.
x Subscript has the wrong type `tbl_df<
  name: character
  date: character
  s1  : character
  s2rl: character
  f1  : character
  f2rl: character
  exam: double
>`.
ℹ It must be numeric or character.
ℹ Input `..1` is `across(.)`.

Upvotes: 4

Views: 11948

Answers (2)

TarJae
TarJae

Reputation: 78947

We could do it also otherway round. First use just character 5 as "5" and so on... In this case we have to use NA_character_ which is NA for character type At the end use type.convert(as.is = TRUE) to get integers:

library(dplyr)
test %>%
    mutate(across(s1:f2rl, 
           ~ case_when(. == "Strongly Agree" ~ "5", 
                       . == "Agree" ~ "4",
                       . == "Neutral" ~ "3",
                       . == "Disagree" ~ "2",
                       . == "Strongly Disagree" ~ "1",
                       TRUE ~ NA_character_ ))) %>% 
    type.convert(as.is = TRUE)
# A tibble: 3 x 8
  name   date          s1  s2rl    f1  f2rl  exam early
  <chr>  <chr>      <int> <int> <int> <int> <int> <lgl>
1 Justin 2021-08-09     4     4     5     5    90 TRUE 
2 Corey  2021-10-29     3     3     2     2    99 FALSE
3 Sibley 2021-01-01     1     1     1     1   100 FALSE

Upvotes: 3

akrun
akrun

Reputation: 887251

We can use matches to pass regex

library(dplyr)
test %>% 
    mutate(across(matches('^(s|f)'), ~ case_when(. == "Strongly Agree" ~ 5, 
                     . == "Agree" ~ 4,
                     . == "Neutral" ~ 3,
                     . == "Disagree" ~ 2,
                     . == "Strongly Disagree" ~ 1,
                     TRUE ~ NA_real_)))

-output

# A tibble: 3 x 8
  name   date          s1  s2rl    f1  f2rl  exam early
  <chr>  <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
1 Justin 2021-08-09     4     4     5     5    90 TRUE 
2 Corey  2021-10-29     3     3     2     2    99 FALSE
3 Sibley 2021-01-01     1     1     1     1   100 FALSE

According to ?across

across() makes it easy to apply the same transformation to multiple columns, allowing you to use select() semantics inside in "data-masking" functions like summarise() and mutate().

If we check the ?select, it returns with the various select-helpers used for selecting columns which can be used in across as well

Tidyverse selections implement a dialect of R where operators make it easy to select variables:

: for selecting a range of consecutive variables.

! for taking the complement of a set of variables.

& and | for selecting the intersection or the union of two sets of variables.

c() for combining selections.

In addition, you can use selection helpers. Some helpers select specific columns:

everything(): Matches all variables.

last_col(): Select last variable, possibly with an offset.

These helpers select variables by matching patterns in their names:

starts_with(): Starts with a prefix.

ends_with(): Ends with a suffix.

contains(): Contains a literal string.

matches(): Matches a regular expression.

num_range(): Matches a numerical range like x01, x02, x03.

These helpers select variables from a character vector:

all_of(): Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown.

any_of(): Same as all_of(), except that no error is thrown for names that don't exist.

This helper selects variables with a function:

where(): Applies a function to all variables and selects those for which the function returns TRUE.

Upvotes: 6

Related Questions