Reputation: 15458
I have a data.table as follows,
library(data.table)
dt<-structure(list(varx = c(0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L
), vary = c(0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L)), class = c("data.table",
"data.frame"), row.names = c(NA, -10L))
dt
varx vary
1: 0 0
2: 1 0
3: 0 0
4: 0 0
5: 1 1
6: 0 0
7: 1 1
8: 0 0
9: 0 0
10: 0 0
and I am trying to get the following output:
dt
varx vary
1: varx_n vary_n
2: varx_y vary_n
3: varx_n vary_n
4: varx_n vary_n
5: varx_y vary_y
6: varx_n vary_n
7: varx_y vary_y
8: varx_n vary_n
9: varx_n vary_n
10: varx_n vary_n
using the following code:
dt[,lapply(.SD, function(x){
ifelse(x==1,paste0(.SD,"_y"),paste0(.SD,"_n"))
})]
However, I am not getting the desired output. Please help.
Upvotes: 2
Views: 117
Reputation: 47300
in base R
:
dt[dt==0] <- "_n"
dt[dt=="1"] <- "_y"
dt[] <- Map(paste0,names(dt),dt)
# varx vary
# 1: varx_n vary_n
# 2: varx_y vary_n
# 3: varx_n vary_n
# 4: varx_n vary_n
# 5: varx_y vary_y
# 6: varx_n vary_n
# 7: varx_y vary_y
# 8: varx_n vary_n
# 9: varx_n vary_n
# 10: varx_n vary_n
Upvotes: 2
Reputation: 34703
The following works:
dt[ , lapply(setNames(nm = names(.SD)), function(nm_j)
sprintf('%s_%s', nm_j, c('n', 'y')[.SD[[nm_j]] + 1L]))]
# varx vary
# 1: varx_n vary_n
# 2: varx_y vary_n
# 3: varx_n vary_n
# 4: varx_n vary_n
# 5: varx_y vary_y
# 6: varx_n vary_n
# 7: varx_y vary_y
# 8: varx_n vary_n
# 9: varx_n vary_n
# 10: varx_n vary_n
The problem with your approach is that, in lapply(.SD, ...)
, in the scope of FUN
the name of the current list element (i.e., the column name) is unknown. To get around this, we loop over column names whereby we can give ourselves access to both the column names and the contents of the columns.
The setNames
part is just for convenience, it can easily be broken out if you find it too code-golfy -- it will create an object c(varx = 'varx', vary = 'vary')
, which lets the output automatically get the right names. If we do lapply(names(.SD), ...)
, we'll have to clean up the column names afterwards.
c('n', 'y')[idx + 1L]
is a bit of a murky way of saying ifelse(idx, 'y', 'n')
(one of the places where 0-based indexing would be nice); it can be replaced with that as you see fit. If your data is massive, you'll notice my version is faster.
Upvotes: 3
Reputation: 93813
Use Map
and a bit of factor
labelling to pair each variable name with the n/y
label required.
dt[, Map(paste, names(dt), lapply(.SD,factor,labels=c("n","y")), sep="_")]
# varx vary
# 1: varx_n vary_n
# 2: varx_y vary_n
# 3: varx_n vary_n
# 4: varx_n vary_n
# 5: varx_y vary_y
# 6: varx_n vary_n
# 7: varx_y vary_y
# 8: varx_n vary_n
# 9: varx_n vary_n
#10: varx_n vary_n
Upvotes: 6