Adam B.
Adam B.

Reputation: 1180

Converting all binary (0, 1, NA) variables to factors

I've got a big ~ 15,000 x 1,500 dataset that I've loaded in from an SPSS .sav file. Most of the variables are labeled, even the continuous ones. I'd like to take all variables that are clearly factors (i.e. the ones with only 0, 1, and NA values) and use the to_factor() function to turn them into factors. I've been trying to figure out a mutate_if() condition that would evaluate true for all variables with only (0, 1, NA) unique values, but I'm stuck.

library(tidyverse)

df <- tibble(X1 = rnorm(50), X2 = rnorm(50), X3 = rnorm(50), 
             X4 = sample(c(0,1), 50, replace = TRUE),
             X5 = sample(c(0,1), 50, replace = TRUE), 
             X6 = rnorm(50), X7 = sample(c(0,1), 50, replace = TRUE))

# Here's a hacky way that I tried, doesn't work

df %>%
   mutate_if(sum(unique(.), na.rm = TRUE) == 1, ~ as.factor(.x))

Upvotes: 2

Views: 964

Answers (1)

akrun
akrun

Reputation: 887078

We can pass two conditions in mutate_if- check the column is numeric (is.numeric) and all the unique values are %in% 0, 1 or NA - to select the columns and then convert it to factor class

library(dplyr)
df %>%
      mutate_if(~ is.numeric(.) && all(unique(.) %in% c(0, 1, NA)), factor)
# A tibble: 10 x 5
#         X1 X2    X3       X4 X5   
#      <dbl> <fct> <chr> <int> <fct>
# 1 -0.546   1     a        18 1    
# 2  0.537   1     b         1 1    
# 3  0.420   1     c         5 1    
# 4 -0.584   1     d        20 0    
# 5  0.847   1     e        11 0    
# 6  0.266   0     f        14 1    
# 7  0.445   1     g         6 1    
# 8 -0.466   <NA>  h         6 0    
# 9 -0.848   <NA>  i        14 0    
#10  0.00231 1     j         3 1    

data

set.seed(24)
df <- tibble(X1 = rnorm(10), X2= sample(c(1, 0, NA), 10, replace = TRUE), X3 = letters[1:10], X4 = sample(20, 10,  replace = TRUE), X5 = sample(c(1, 0), 10, replace = TRUE))

Upvotes: 4

Related Questions