user14345711
user14345711

Reputation: 19

How to generate multiple dummy variables?

data1 <- data.frame(zone1 = c("A","A","A","A","B"),
                    zone2 = c("a","a","a","a","b"),
                    name = c("apple","pear","pine","banana","orange"),
                    locate = c("poor","poor","room","room","room"),
                    time = c(2000,2000,2000,2000,2001))
data2 <- data.frame(locate = c("poor","room"),
                    A = c(1,1),
                    B = c(0,1),
                    a = c(1,1),
                    b = c(0,1),
                    apple = c(1,0),
                    pear = c(1,0),
                    pine = c(0,1),
                    banana = c(0,1),
                    orange = c(0,1),
                    "2000" = c(1,1),
                    "2001" = c(0,1))

I tried to use spread() function to realize it.

dums <- data1 %>%
   select("locate", everything()) %>%
   mutate(zone1yes = 1,
          timeyes = 1,
          zone2yes = 1,
          nameyes = 1) %>%
   spread(zone1, zone1yes) %>%
   spread(time, timeyes) %>%
   spread(name, nameyes) %>%
   spread(zone2, zone2yes)

but I found that there are some errors and I don't know the reason.How can I realize it?

Upvotes: 0

Views: 175

Answers (2)

akrun
akrun

Reputation: 887951

An option with melt/dcast from data.table

library(data.table)
dcast(melt(setDT(data1), id.var = 'locate'), locate ~ value, function(x) +(length(x) > 0))

-output

#   locate 2000 2001 A B a apple b banana orange pear pine
#1:   poor    1    0 1 0 1     1 0      0      0    1    0
#2:   room    1    1 1 1 1     0 1      1      1    0    1

data

df <- structure(list(zone1 = c("A", "A", "A", "A", "B"), zone2 = c("a", 
"a", "a", "a", "b"), name = c("apple", "pear", "pine", "banana", 
"orange"), locate = c("poor", "poor", "room", "room", "room"), 
    time = c(2000, 2000, 2000, 2000, 2001)), class = "data.frame", 
row.names = c(NA, -5L))

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389325

Using dplyr and tidyr you can do :

library(dplyr)
library(tidyr)

data1 %>%
  mutate(across(.fns = as.character)) %>%
  pivot_longer(cols = -locate) %>%
  pivot_wider(names_from = value, values_fill = 0, id_cols = locate, 
              values_fn = function(x) as.integer(length(x) > 0))

#  locate     A     a apple `2000`  pear  pine banana     B     b orange `2001`
#  <chr>  <int> <int> <int>  <int> <int> <int>  <int> <int> <int>  <int>  <int>
#1 poor       1     1     1      1     1     0      0     0     0      0      0
#2 room       1     1     0      1     0     1      1     1     1      1      1

Since you have data of different types we need to first convert them to one type i.e character. Get them in long format and get it back in wide by assign 1 where the value is present and 0 otherwise.

data

df <- structure(list(zone1 = c("A", "A", "A", "A", "B"), zone2 = c("a", 
"a", "a", "a", "b"), name = c("apple", "pear", "pine", "banana", 
"orange"), locate = c("poor", "poor", "room", "room", "room"), 
    time = c(2000, 2000, 2000, 2000, 2001)), class = "data.frame", 
row.names = c(NA, -5L))

Upvotes: 1

Related Questions