concatenating column names with column data in R (using data.table)

I have a data.table as follows,

library(data.table)

dt<-structure(list(varx = c(0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L
), vary = c(0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L)), class = c("data.table", 
"data.frame"), row.names = c(NA, -10L))
dt
    varx vary
 1:    0    0
 2:    1    0
 3:    0    0
 4:    0    0
 5:    1    1
 6:    0    0
 7:    1    1
 8:    0    0
 9:    0    0
10:    0    0

and I am trying to get the following output:

dt 
    varx    vary
1:  varx_n  vary_n
2:  varx_y  vary_n
3:  varx_n  vary_n
4:  varx_n  vary_n
5:  varx_y  vary_y
6:  varx_n  vary_n
7:  varx_y  vary_y
8:  varx_n  vary_n
9:  varx_n  vary_n
10: varx_n  vary_n

using the following code:

dt[,lapply(.SD, function(x){
  ifelse(x==1,paste0(.SD,"_y"),paste0(.SD,"_n"))
})]

However, I am not getting the desired output. Please help.

Upvotes: 2

Answers (3)

moodymudskipper

Reputation: 47350

in base R:

dt[dt==0] <- "_n" 
dt[dt=="1"] <- "_y" 
dt[] <- Map(paste0,names(dt),dt)
#       varx   vary
#  1: varx_n vary_n
#  2: varx_y vary_n
#  3: varx_n vary_n
#  4: varx_n vary_n
#  5: varx_y vary_y
#  6: varx_n vary_n
#  7: varx_y vary_y
#  8: varx_n vary_n
#  9: varx_n vary_n
# 10: varx_n vary_n

Upvotes: 2

MichaelChirico

Reputation: 34763

The following works:

dt[ , lapply(setNames(nm = names(.SD)), function(nm_j) 
  sprintf('%s_%s', nm_j, c('n', 'y')[.SD[[nm_j]] + 1L]))]
#       varx   vary
#  1: varx_n vary_n
#  2: varx_y vary_n
#  3: varx_n vary_n
#  4: varx_n vary_n
#  5: varx_y vary_y
#  6: varx_n vary_n
#  7: varx_y vary_y
#  8: varx_n vary_n
#  9: varx_n vary_n
# 10: varx_n vary_n

The problem with your approach is that, in lapply(.SD, ...), in the scope of FUN the name of the current list element (i.e., the column name) is unknown. To get around this, we loop over column names whereby we can give ourselves access to both the column names and the contents of the columns.

The setNames part is just for convenience, it can easily be broken out if you find it too code-golfy -- it will create an object c(varx = 'varx', vary = 'vary'), which lets the output automatically get the right names. If we do lapply(names(.SD), ...), we'll have to clean up the column names afterwards.

c('n', 'y')[idx + 1L] is a bit of a murky way of saying ifelse(idx, 'y', 'n') (one of the places where 0-based indexing would be nice); it can be replaced with that as you see fit. If your data is massive, you'll notice my version is faster.

Upvotes: 3

thelatemail

Reputation: 93938

Use Map and a bit of factor labelling to pair each variable name with the n/y label required.

dt[, Map(paste, names(dt), lapply(.SD,factor,labels=c("n","y")), sep="_")]

#      varx   vary
# 1: varx_n vary_n
# 2: varx_y vary_n
# 3: varx_n vary_n
# 4: varx_n vary_n
# 5: varx_y vary_y
# 6: varx_n vary_n
# 7: varx_y vary_y
# 8: varx_n vary_n
# 9: varx_n vary_n
#10: varx_n vary_n

Upvotes: 6

concatenating column names with column data in R (using data.table)

Answers (3)

Related Questions