Converting all binary (0, 1, NA) variables to factors

Question

I've got a big ~ 15,000 x 1,500 dataset that I've loaded in from an SPSS .sav file. Most of the variables are labeled, even the continuous ones. I'd like to take all variables that are clearly factors (i.e. the ones with only 0, 1, and NA values) and use the to_factor() function to turn them into factors. I've been trying to figure out a mutate_if() condition that would evaluate true for all variables with only (0, 1, NA) unique values, but I'm stuck.

library(tidyverse)

df <- tibble(X1 = rnorm(50), X2 = rnorm(50), X3 = rnorm(50), 
             X4 = sample(c(0,1), 50, replace = TRUE),
             X5 = sample(c(0,1), 50, replace = TRUE), 
             X6 = rnorm(50), X7 = sample(c(0,1), 50, replace = TRUE))

# Here's a hacky way that I tried, doesn't work

df %>%
   mutate_if(sum(unique(.), na.rm = TRUE) == 1, ~ as.factor(.x))

akrun · Accepted Answer

We can pass two conditions in mutate_if- check the column is numeric (is.numeric) and all the unique values are %in% 0, 1 or NA - to select the columns and then convert it to factor class

library(dplyr)
df %>%
      mutate_if(~ is.numeric(.) && all(unique(.) %in% c(0, 1, NA)), factor)
# A tibble: 10 x 5
#         X1 X2    X3       X4 X5   
#          
# 1 -0.546   1     a        18 1    
# 2  0.537   1     b         1 1    
# 3  0.420   1     c         5 1    
# 4 -0.584   1     d        20 0    
# 5  0.847   1     e        11 0    
# 6  0.266   0     f        14 1    
# 7  0.445   1     g         6 1    
# 8 -0.466     h         6 0    
# 9 -0.848     i        14 0    
#10  0.00231 1     j         3 1

data

set.seed(24)
df <- tibble(X1 = rnorm(10), X2= sample(c(1, 0, NA), 10, replace = TRUE), X3 = letters[1:10], X4 = sample(20, 10,  replace = TRUE), X5 = sample(c(1, 0), 10, replace = TRUE))

Converting all binary (0, 1, NA) variables to factors

Answers (1)

data

Related Questions