Reputation: 593
In a data frame, I want to turn a group of variables, currently representing dummies, to a single categorical variable. For example, in my data I have several variable representing geographical regions:
City North Centre South
----------------------------
Milan 1 0 0
Rome 0 1 0
Naples 0 0 1
Venice 1 0 0
df <- structure(list(City = c("Milan", "Rome", "Naples", "Venice"),
North = c(1L, 0L, 0L, 1L), Centre = c(0L, 1L, 0L, 0L), South = c(0L,
0L, 1L, 0L)), .Names = c("City", "North", "Centre", "South"
), row.names = c(NA, -4L), class = "data.frame")
I want to change it to:
City Region
--------------
Milan North
Rome Centre
Naples South
Venice North
I can create the variable Region
with dplyr
with the following commands:
df %>% mutate(Region = case_when(
.$North==1 ~ "North", .$Centre==1 ~ "Centre", .$South==1 ~ "South"))
I wonder how to do the same with date.table
, which I am currently learning, given that the function case_when
is not available. I am looking for a similar one line solution.
Upvotes: 1
Views: 239
Reputation: 93938
No need for packages at all:
names(dat[,-1])[max.col(dat[,-1])]
#[1] "North" "Centre" "South" "North"
If you want to massage it to data.table specifically
dat[, .(City, Region=names(.SD)[max.col(.SD)]), .SDcols=-1]
# City Region
#1: Milan North
#2: Rome Centre
#3: Naples South
#4: Venice North
If speed is absolutely critical:
dat[, names(.SD)[Reduce(`+`, Map(`*`, .SD, seq_along(.SD)))], .SDcols=-1]
Upvotes: 3