milan
milan

Reputation: 4970

Variable categories in R dataframe as new column variable as logical

I would like to convert the first dataframe to the second:

data.frame(sex=c('M', 'M', 'F'), age=c('old', 'young', 'old'))
  sex   age
1   M   old
2   M young
3   F   old

data.frame(sex_M=c(T,T,F), sex_F=c(F, F, T), age_old=c(T, F, T), age_young=c(F, T, F))
  sex_M sex_F age_old age_young
1  TRUE FALSE    TRUE     FALSE
2  TRUE FALSE   FALSE      TRUE
3 FALSE  TRUE    TRUE     FALSE

Upvotes: 0

Views: 44

Answers (4)

meriops
meriops

Reputation: 1037

For the fun, here's another tidyverse approach, using across() :

library(tidyverse)

df <- data.frame(sex = c("M", "M", "F"), age = c("old", "young", "old"))

newdf <-
  df %>%
  mutate(across(sex, list(sex_M = ~ .x == "M", sex_F = ~ .x == "F"))) %>%
  mutate(across(age, list(old = ~ .x == "old", young = ~ .x == "young"))) %>%
  select(-sex, -age)

Upvotes: 0

alex_jwb90
alex_jwb90

Reputation: 1713

You can use fastDummies::dummy_cols like such:

library(fastDummies)

df <- data.frame(sex=c('M', 'M', 'F'), age=c('old', 'young', 'old'))

df2 <- dummy_cols(
    .data = df,
    select_columns = c("sex","age"), 
    remove_selected_columns = T
  )

Upvotes: 0

Duck
Duck

Reputation: 39595

Try this tidyverse approach. You can reshape your data to long and the compute a new variable name an also define the logical values. After that you can reshape to wide and set NA as logic FALSE. Here the code:

library(tidyverse)
#Code
df %>% mutate(id=1:n()) %>%
    pivot_longer(-id) %>%
    mutate(var=paste0(name,'.',value)) %>%
    mutate(val=T) %>% select(-c(value,name)) %>%
    pivot_wider(names_from = var,values_from=val) %>%
    replace(is.na(.),F) %>% select(-id)

Output:

# A tibble: 3 x 4
  sex.M age.old age.young sex.F
  <lgl> <lgl>   <lgl>     <lgl>
1 TRUE  TRUE    FALSE     FALSE
2 TRUE  FALSE   TRUE      FALSE
3 FALSE TRUE    FALSE     TRUE 

Some data used:

#Data
df <- structure(list(sex = structure(c(2L, 2L, 1L), .Label = c("F", 
"M"), class = "factor"), age = structure(c(1L, 2L, 1L), .Label = c("old", 
"young"), class = "factor")), class = "data.frame", row.names = c(NA, 
-3L))

Upvotes: 1

Pedro Faria
Pedro Faria

Reputation: 869

You just need to apply logical tests in your table that describes your desire, like this:

df_test <- data.frame(
  sex_M = df$sex == "M",
  sex_F = df$sex == "F",
  age_old = df$age == "old",
  age_young = df$age == "young"
)  

Resulting this:

  sex_M sex_F age_old age_young
1  TRUE FALSE    TRUE     FALSE
2  TRUE FALSE   FALSE      TRUE
3 FALSE  TRUE    TRUE     FALSE

Here is the input data:

df <- data.frame(sex=c('M', 'M', 'F'), age=c('old', 'young', 'old'))

  sex   age
1   M   old
2   M young
3   F   old

Upvotes: 0

Related Questions