Justin
Justin

Reputation: 37

R data table: modify column values by referencing other columns by name

I have a melted data table with a column containing values that refer to other column names within the same table. I want to replace each row within that same column with the row value of the referenced column.

library("data.table")

## Example input data table
DT_input <- data.table(A=c(1:10),
                       B=c(11:20),
                       C=c(21:30),
                       replace=c(rep("A", 5), rep("B", 3), rep("C", 2)))

## Desired output data table
DT_output <- data.table(A=c(1:10),
                        B=c(11:20),
                        C=c(21:30),
                        replace=c(1:5, 16:18, 29:30))

My old approach shown here is very slow because of the for loop:

## Attempted looping solution
for (kRow in seq_len(nrow(DT_input))) {
    e <- parse(text = DT_input[kRow, Variable])
    DT_input[kRow, Variable := eval(e)]
}

Upvotes: 1

Views: 529

Answers (2)

chinsoon12
chinsoon12

Reputation: 25225

An option using data.table:

DT_input[, rn := .I]
DT_input[, replace := 
    DT_input[, DT_input[.SD, on=c("rn", .BY$replace), get(.BY$replace)], .(replace)]$V1
]

output:

     A  B  C replace
 1:  1 11 21       1
 2:  2 12 22       2
 3:  3 13 23       3
 4:  4 14 24       4
 5:  5 15 25       5
 6:  6 16 26      16
 7:  7 17 27      17
 8:  8 18 28      18
 9:  9 19 29      29
10: 10 20 30      30

It will be slower than Akrun base R method.

Upvotes: 1

akrun
akrun

Reputation: 886938

If we need a vectorized approach use the row/column indexing from base R

i1 <- cbind(seq_len(nrow(df1)), match(df1$replace, names(df1)[-4]))
df1$replace <- df1[-4][i1]
df1$replace
#[1]  1  2  3  4  5 16 17 18 29 30

With data.table, an option is Map or for loop without the eval, but it would be still not vectorized

data

df1 <- as.data.frame(DT_input)

Upvotes: 1

Related Questions