TarJae
TarJae

Reputation: 78917

dplyr: mutate a factor column with 3 levels to 3 logical columns with TRUE and FALSE

In the Iris dataset Species is a factor variable with 3 levels("setosa" "versicolor" "virginica"). I would like to create 3 additional columns named ("setosa" "versicolor" "virginica") with False and True as logical factor variable for each column. In short: I would like to dichotomize the levels of the variable Species in the Iris dataset into 3 new columns as a logical variable. My code works, but I wonder if there is a more straight way:

df <- iris %>%
  select(Species) %>% 
  mutate(setosa = case_when(Species=="setosa" ~ 1,
                            TRUE ~ 0),
         versicolor = case_when(Species=="versicolor" ~ 1,
                            TRUE ~ 0),
         virginica = case_when(Species=="virginica" ~ 1,
                            TRUE ~ 0),
         )
df$setosa <- as.logical(df$setosa)
df$versicolor <- as.logical(df$versicolor)
df$virginica <- as.logical(df$virginica)

Upvotes: 2

Views: 1759

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269451

Use any of these:

iris %>% cbind(sapply(levels(.$Species), `==`, .$Species))

iris %>% cbind(model.matrix(~ Species + 0, .) == 1)

iris %>% cbind(outer(.$Species, setNames(levels(.$Species), levels(.$Species)), "=="))

expand_factor <- function(f) {
  m <- matrix(0, length(f), nlevels(f), dimnames = list(NULL, levels(f)))
  replace(m, cbind(seq_along(f), f), 1)
}
iris %>% cbind(expand_factor(.$Species) == 1)

library(nnet)
iris %>% cbind(class.ind(.$Species) == 1)

Upvotes: 5

user10917479
user10917479

Reputation:

Here is another tidyverse way. I find it tedious and personally would not use it for anything as simple as your example, but it can be useful for more complex applications. For example, if you are "one hot" encoding multiple variables, it may for some reason be nice to have that single variable stored all within one column. Then you can extract it without having to constantly grab a varying amount of columns for different variables.

This makes use of the ability to store a list() inside of a tibble, and then unnests it into columns.

library(purrr)
library(dplyr)
library(tidyr)

iris %>% 
  mutate(species_one_hot = map(Species, ~ set_names(levels(Species) == .x, levels(Species)))) %>% 
  unnest_wider(species_one_hot)

Here is how you could stop a step earlier to just store the coding for later.

iris2 <- iris %>% 
  mutate(species_one_hot = map(Species, ~ set_names(levels(Species) == .x, levels(Species))))

# now you can grab a single column and have the full encoding
bind_rows(iris2$species_one_hot)

Upvotes: 1

Duck
Duck

Reputation: 39595

Try this creating a logical variable directly for Species, as well as a copy, and then reshape to wide using tidyverse functions. You will also need an id variable for your rows. Here the code:

library(dplyr)
library(tidyr)
#Data
data(iris)
#Code
df <- iris %>% mutate(id=row_number(),Species2=Species) %>%
  select(c(id,Species,Species2)) %>%
  mutate(Value=T) %>%
  pivot_wider(names_from = Species2,values_from=Value,values_fill=F) %>%
  select(-id)

Output:

# A tibble: 150 x 4
   Species setosa versicolor virginica
   <fct>   <lgl>  <lgl>      <lgl>    
 1 setosa  TRUE   FALSE      FALSE    
 2 setosa  TRUE   FALSE      FALSE    
 3 setosa  TRUE   FALSE      FALSE    
 4 setosa  TRUE   FALSE      FALSE    
 5 setosa  TRUE   FALSE      FALSE    
 6 setosa  TRUE   FALSE      FALSE    
 7 setosa  TRUE   FALSE      FALSE    
 8 setosa  TRUE   FALSE      FALSE    
 9 setosa  TRUE   FALSE      FALSE    
10 setosa  TRUE   FALSE      FALSE    
# ... with 140 more rows

Upvotes: 1

Related Questions