Reputation: 1430

numeric comparison on multiple columns using referenced name for columns in data.table

I've got a data.table DT with a few hundred columns named eg1, eg2, eg3, eg4, ..., eg10, ...

I'd like to generate new columns for all columns except the first one (eg1) based on comparison of each column with its previous column.

DT:

eg1 eg2 eg3 eg4 eg5
1   1   1   0   0
0   0   0   0   0
0   1   0   1   1
0   0   1   1   1
1   0   0   0   0
1   1   1   1   1

and I want:

eg1 eg2 eg3 eg4 eg5    eg2_m    eg3_m   eg4_m   eg5_m
1   1   1   0   0      0    0   1   0
0   0   0   0   0      0    0   0   0
0   1   0   1   1      1    1   1   0
0   0   1   1   1      0    1   0   0
1   0   0   0   0      1    0   0   0
1   1   1   1   1      0    0   0   0

you will note that the newly generated columns are 1 when the original column has had a value change from the previous one. For example, eg2_m is 0 if the value of eg2 is the same as the value of eg1 in that row, otherwise eg2_m is 1. and eg3_m value is 0 if value of eg3 is the same as value of eg2, otherwise value of eg3_m is 1.

I've got hundreds of these egX columns and I don't know where they are located in DT. So, i'd like to reference the columns by a name vector:

Colnames <- paste("eg", (1:500), sep = '')

I've tried something like this and it doesn't work (i know it's anything but sophisticated)

for (i in 2:length(Colnames))

  result <- DT[, (paste(Colnames[i], '_m', sep = '')) := ifelse( 
    `Colnames[i]` < `Colnames[i-1]` 
    , 1,0)
    ]

The error i get is

object 'Colnames[i]' not found

Upvotes: 1

Answers (3)

minem

Reputation: 3650

Using Colnames:

Colnames <- paste("eg", (1:5), sep = '')
xx <- x[, Colnames, with = F]
xx <- (xx[, -1] != xx[, -ncol(xx), with = F]) + 0L
xx <- as.data.table(xx)
setnames(xx, paste0(colnames(xx), '_m'))
x <- cbind(x, xx)
x
#    eg1 eg2 eg3 eg4 eg5 eg2_m eg3_m eg4_m eg5_m
# 1:   1   1   1   0   0     0     0     1     0
# 2:   0   0   0   0   0     0     0     0     0
# 3:   0   1   0   1   1     1     1     1     0
# 4:   0   0   1   1   1     0     1     0     0
# 5:   1   0   0   0   0     1     0     0     0
# 6:   1   1   1   1   1     0     0     0     0

Upvotes: 1

Alex

Reputation: 13

Try this: let X be your data table (mine is populated with binomial variates).

X <- matrix(rbinom(100,1,0.4),nrow=10)
X <- as.data.frame(X)

y <- X[,1:9] != X[,2:10]
y <- y + 0

Then rename the colnames of y accordingly.

Upvotes: 0

A5C1D2H2I1M1N2O1R2T1

Reputation: 193527

Try:

> mydf[, .SD[, 1:(ncol(.SD)-1)] != .SD[, -1]] + 0L
     eg1 eg2 eg3 eg4
[1,]   0   0   1   0
[2,]   0   0   0   0
[3,]   1   1   1   0
[4,]   0   1   0   0
[5,]   1   0   0   0
[6,]   0   0   0   0

"mydf" is defined as:

mydf <- structure(list(eg1 = c(1L, 0L, 0L, 0L, 1L, 1L), eg2 = c(1L, 0L,                               
         1L, 0L, 0L, 1L), eg3 = c(1L, 0L, 0L, 1L, 0L, 1L), eg4 = c(0L,                                     
         0L, 1L, 1L, 0L, 1L), eg5 = c(0L, 0L, 1L, 1L, 0L, 1L)), 
         .Names = c("eg1", "eg2", "eg3", "eg4", "eg5"), row.names = c(NA, 
         6L), class = c("data.table", "data.frame"))

Upvotes: 2

numeric comparison on multiple columns using referenced name for columns in data.table

Answers (3)

Related Questions