Shawn Hemelstrand
Shawn Hemelstrand

Reputation: 3228

How to mutate shared value from similar and dissimilar variables using dplyr?

Was having an oddly difficult time trying to find the answer to this probably much answered question, but here is my query. Suppose I have a data frame like the one below:

df <- data.frame(age = c(30,40,-999,40,20),
           money.usd = c(-999,55,55,54,30),
           cars = c(1,1,2,0,-999))

Filtering and mutating the values are straight forward for single variables. For example, with an ifelse statement, I can turn the -999 in age to NA in the following way:

df %>% 
  mutate(age = ifelse(age == -999,"NA",age))

However, since all of these variables have this value and have different names, I was curious how I can achieve this sort of mutation across several variables. Additionally, if there is the case of many similar variables and many dissimilar variables, I imagine the case is more complicated but certainly ways to make it easier. For example, if I have the following data with three variables for "car":

df.2 <- data.frame(age = c(30,40,-999,40,20),
           money.usd = c(-999,55,55,54,30),
           cars.1 = c(1,1,2,0,-999),
           cars.2 = c(0,1,-999,0,0),
           cars.3 = c(-999,5,4,5,4))

How would one mutate the value for both age and money.usd while also selecting several variables of car in order to mutate the -999 value? To summarize, my main objective is switching this -999 value from across the data frame to a NA value.

Upvotes: 0

Views: 54

Answers (1)

Ma&#235;l
Ma&#235;l

Reputation: 52004

You can use tidyr::na_if to replace a value by NA, and across to apply it to multiple columns.

library(tidyr)
library(dplyr)

df.2 %>% 
  mutate(across(everything(), ~ na_if(.x, -999)))

If not NA, use replace:

df.2 %>% 
  mutate(across(everything(), ~ replace(.x, .x == -999, NA)))
  age money.usd cars.1 cars.2 cars.3
1  30        NA      1      0     NA
2  40        55      1      1      5
3  NA        55      2     NA      4
4  40        54      0      0      5
5  20        30     NA      0      4

Or in base R:

df.2[df.2 == -999] <- NA

Upvotes: 1

Related Questions