joy_1379
joy_1379

Reputation: 499

Creating dummy variables from variable with multiple values in r

I have a data frame below where I want to add a new column where if language spoken is English then 1 else 0

language_spoken
         Jap;Fre
         Jap;Fre
         Fre;Ch
         Eng
         Eng;Jap
         Hindi;Eng
               
         Eng;Spanish;Fre
         Spanish;Jap
         Spanish

Final dataframe

      language_spoken   Eng
         Jap;Fre         0
         Jap;Fre         0
         Fre;Ch          0
         Eng             1
         Eng;Jap         1
         Hindi;Eng       1
                         0
        Eng;Spanish;Fre  1
        Spanish;Jap      0
        Spanish          0

what I tried is below but it's not working

     b <- data.frame(model.matrix(~.-1,data))
     b

Sample data set below:

     data <- data.frame(language_spoken = c("Jap;Fre","Jap;Fre","Fre;Ch","Eng","Eng;Jap","Hindi;Eng","","Eng;Spanish;Fre","Spanish;Jap","Spanish"))

Upvotes: 0

Views: 246

Answers (2)

Karthik S
Karthik S

Reputation: 11584

Does this work:

library(dplyr)
df %>% mutate(End = +grepl('Eng',language_spoken))
            language_spoken End
1                   Jap;Fre   0
2                   Jap;Fre   0
3                    Fre;Ch   0
4                       Eng   1
5                   Eng;Jap   1
6                 Hindi;Eng   1
7                             0
8           Eng;Spanish;Fre   1
9               Spanish;Jap   0
10                  Spanish   0

Upvotes: 1

TimTeaFan
TimTeaFan

Reputation: 18541

In base R this will work:

data$Eng <- as.integer(grepl("Eng", data$language_spoken))

data
#>    language_spoken Eng
#> 1          Jap;Fre   0
#> 2          Jap;Fre   0
#> 3           Fre;Ch   0
#> 4              Eng   1
#> 5          Eng;Jap   1
#> 6        Hindi;Eng   1
#> 7                    0
#> 8  Eng;Spanish;Fre   1
#> 9      Spanish;Jap   0
#> 10         Spanish   0

This would be one tidyverse approach:

library(dplyr)
library(stringr)

data %>%
  mutate(Eng = as.numeric(str_detect(language_spoken, "Eng")))

#>    language_spoken Eng
#> 1          Jap;Fre   0
#> 2          Jap;Fre   0
#> 3           Fre;Ch   0
#> 4              Eng   1
#> 5          Eng;Jap   1
#> 6        Hindi;Eng   1
#> 7                    0
#> 8  Eng;Spanish;Fre   1
#> 9      Spanish;Jap   0
#> 10         Spanish   0

Created on 2021-07-21 by the reprex package (v0.3.0)

Upvotes: 2

Related Questions