Massimo2013
Massimo2013

Reputation: 593

Dummies to factors (in data.table)

In a data frame, I want to turn a group of variables, currently representing dummies, to a single categorical variable. For example, in my data I have several variable representing geographical regions:

City    North  Centre  South
----------------------------
Milan       1       0      0
Rome        0       1      0
Naples      0       0      1
Venice      1       0      0

df <- structure(list(City = c("Milan", "Rome", "Naples", "Venice"), 
North = c(1L, 0L, 0L, 1L), Centre = c(0L, 1L, 0L, 0L), South = c(0L, 
0L, 1L, 0L)), .Names = c("City", "North", "Centre", "South"
), row.names = c(NA, -4L), class = "data.frame")

I want to change it to:

City    Region
--------------
Milan    North
Rome    Centre
Naples   South
Venice   North

I can create the variable Region with dplyr with the following commands:

df %>% mutate(Region = case_when(
                      .$North==1 ~ "North", .$Centre==1 ~ "Centre", .$South==1 ~ "South"))

I wonder how to do the same with date.table, which I am currently learning, given that the function case_when is not available. I am looking for a similar one line solution.

Upvotes: 1

Views: 239

Answers (1)

thelatemail
thelatemail

Reputation: 93938

No need for packages at all:

names(dat[,-1])[max.col(dat[,-1])]
#[1] "North"  "Centre" "South"  "North"

If you want to massage it to data.table specifically

dat[, .(City, Region=names(.SD)[max.col(.SD)]), .SDcols=-1]
#     City Region
#1:  Milan  North
#2:   Rome Centre
#3: Naples  South
#4: Venice  North

If speed is absolutely critical:

dat[, names(.SD)[Reduce(`+`, Map(`*`, .SD, seq_along(.SD)))], .SDcols=-1]

Upvotes: 3

Related Questions