Biased_Observer
Biased_Observer

Reputation: 87

How to combine the across () function with mutate () and case_when () to mutate values in multiple columns according to a condition?

I have demographic data set, which includes the age of people in a household. This is collected via a survey and participants are allowed to refuse providing their age.

The result is a data set with one household per row (each with a household ID code), and various household characteristics such as age in the columns. Refused responses as coded as "R", and you could re-create a sample using the code below:

df <- list(Household_ID = c("1A", "1B", "1C", "1D", "1E"),
           AGE1 = c("25", "47", "39", "50", "R"),
           AGE2 = c("66", "23", "71", "R", "16"),
           AGE3 = c("28", "17", "R", "R", "80"),
           AGE4 = c("81", "22", "48", "59", "R"))

df <- as_tibble(df)

> df
# A tibble: 5 x 5
  Household_ID AGE1  AGE2  AGE3  AGE4 
  <chr>        <chr> <chr> <chr> <chr>
1 1A           25    66    28    81   
2 1B           47    23    17    22   
3 1C           39    71    R     48   
4 1D           50    R     R     59   
5 1E           R     16    80    R 

For our intents and purposes we re-code the "R" to "-9" so that we can subsequently convert the format of the AGE columns to integer, and carry out analysis. We usually do this in another software and my objective is to replicate this process in R.

I have managed to do this with the following code:

df <- df %>% mutate(AGE1 = case_when(AGE1 == "R" ~ "-9", TRUE ~ as.character(AGE1)))
df <- df %>% mutate(AGE2 = case_when(AGE2 == "R" ~ "-9", TRUE ~ as.character(AGE2)))
df <- df %>% mutate(AGE3 = case_when(AGE3 == "R" ~ "-9", TRUE ~ as.character(AGE3)))
df <- df %>% mutate(AGE4 = case_when(AGE4 == "R" ~ "-9", TRUE ~ as.character(AGE4)))

Given that this feels clumsy, I tried to find a solution using mutate_if etc. but read that these have been superseded by across(). Hence, I tried to replicate this operation using across():

df <- df %>%
  mutate(across(AGE1:AEG4),
          ~ (case_when(. == "R" ~ "-9")))

But I get the following error:

Error: Problem with `mutate()` input `..2`.
x Input `..2` must be a vector, not a `formula` object.
i Input `..2` is `~(case_when(. == "R" ~ "-9"))`.

Been wrestling with this and googling for a while now but can't figure out what I am missing. Would really appreciate some input on how to get this working, please and thank you.

EDIT: Solved!

df <- df %>%
  mutate(across(AGE1:AGE4, ~ (case_when(.x == "R" ~ "-9", TRUE ~ as.character(.x)))))

Upvotes: 6

Views: 3181

Answers (3)

Anoushiravan R
Anoushiravan R

Reputation: 21908

Or maybe this one which is not much difference from dear @TarJae's interpretation:

library(dplyr)
library(stringr)


df %>%
  mutate(across(AGE1:AGE4, ~ str_replace(., "R", "-9")),
         across(AGE1:AGE4, as.integer))

# A tibble: 5 x 5
  Household_ID  AGE1  AGE2  AGE3  AGE4
  <chr>        <int> <int> <int> <int>
1 1A              25    66    28    81
2 1B              47    23    17    22
3 1C              39    71    -9    48
4 1D              50    -9    -9    59
5 1E              -9    16    80    -9

Data:

df <- list(Household_ID = c("1A", "1B", "1C", "1D", "1E"),
           AGE1 = c("25", "47", "39", "50", "R"),
           AGE2 = c("66", "23", "71", "R", "16"),
           AGE3 = c("28", "17", "R", "R", "80"),
           AGE4 = c("81", "22", "48", "59", "R"))

df <- as_tibble(df)

Upvotes: 3

TarJae
TarJae

Reputation: 78917

You could use across with replace.

  1. List to tibble with as_tibble()
  2. replace R with -9
  3. integer class for AGE
df %>% 
  as_tibble() %>% 
  mutate(across(everything(), ~replace(., . ==  "R" , "-9"))) %>% 
  type.convert(as.is=TRUE)

Output:

  Household_ID  AGE1  AGE2  AGE3  AGE4
  <chr>        <int> <int> <int> <int>
1 1A              25    66    28    81
2 1B              47    23    17    22
3 1C              39    71    -9    48
4 1D              50    -9    -9    59
5 1E              -9    16    80    -9

Upvotes: 1

AnilGoyal
AnilGoyal

Reputation: 26218

Why not simply?

df[,2:5][df[, 2:5] == 'R'] <- '-9'

# A tibble: 5 x 5
  Household_ID AGE1  AGE2  AGE3  AGE4 
  <chr>        <chr> <chr> <chr> <chr>
1 1A           25    66    28    81   
2 1B           47    23    17    22   
3 1C           39    71    -9    48   
4 1D           50    -9    -9    59   
5 1E           -9    16    80    -9

Upvotes: 2

Related Questions