Reputation: 1180
I've got a big ~ 15,000 x 1,500 dataset that I've loaded in from an SPSS .sav file. Most of the variables are labeled, even the continuous ones. I'd like to take all variables that are clearly factors (i.e. the ones with only 0, 1, and NA values) and use the to_factor()
function to turn them into factors. I've been trying to figure out a mutate_if()
condition that would evaluate true for all variables with only (0, 1, NA) unique values, but I'm stuck.
library(tidyverse)
df <- tibble(X1 = rnorm(50), X2 = rnorm(50), X3 = rnorm(50),
X4 = sample(c(0,1), 50, replace = TRUE),
X5 = sample(c(0,1), 50, replace = TRUE),
X6 = rnorm(50), X7 = sample(c(0,1), 50, replace = TRUE))
# Here's a hacky way that I tried, doesn't work
df %>%
mutate_if(sum(unique(.), na.rm = TRUE) == 1, ~ as.factor(.x))
Upvotes: 2
Views: 964
Reputation: 887078
We can pass two conditions in mutate_if
- check the column is numeric (is.numeric
) and all
the unique
values are %in%
0, 1 or NA - to select the columns and then convert it to factor
class
library(dplyr)
df %>%
mutate_if(~ is.numeric(.) && all(unique(.) %in% c(0, 1, NA)), factor)
# A tibble: 10 x 5
# X1 X2 X3 X4 X5
# <dbl> <fct> <chr> <int> <fct>
# 1 -0.546 1 a 18 1
# 2 0.537 1 b 1 1
# 3 0.420 1 c 5 1
# 4 -0.584 1 d 20 0
# 5 0.847 1 e 11 0
# 6 0.266 0 f 14 1
# 7 0.445 1 g 6 1
# 8 -0.466 <NA> h 6 0
# 9 -0.848 <NA> i 14 0
#10 0.00231 1 j 3 1
set.seed(24)
df <- tibble(X1 = rnorm(10), X2= sample(c(1, 0, NA), 10, replace = TRUE), X3 = letters[1:10], X4 = sample(20, 10, replace = TRUE), X5 = sample(c(1, 0), 10, replace = TRUE))
Upvotes: 4