Reputation: 330

Efficiently update multiple data.table columns based on a single condition

Here's a quick example of what i need:

data <- data.table(x = c(10, -10, 20, -20, 100, -100),
                 y = c(10, -10, 120, -120, 100, -100),
                 z = c(10, -10, 100, -100, 30, -30))

data[x > 90 | x < -90, x:=0]
data[z > 90 | z < -90, z:=0]

This works fine, but I'm looking to replace the last two rows with a simpler version, because there's a lot of columns in my actual data (about 200, out of which i need to update about 50) and a lot of rows (about 2 million, so the data is somewhat heavy)

Apologies if this is a duplicate

Upvotes: 3

Answers (3)

Jasam

Reputation: 11

You can update using update for reference in Functional form:

data[z > 90 | z < -90 , ':=' (x=0, z=0)]

You can check more info: https://www.rdocumentation.org/packages/data.table/versions/1.12.8/topics/%3A%3D

Upvotes: 1

Roland

Reputation: 132969

Probably easiest and most efficient with eval(parse(...)):

cols <- c("x", "z")
for (col in cols) 
  eval(parse(text = sprintf("data[%s > 90 | %s < -90, %s := 0][]", col, col, col)))
#     x    y   z
#1:  10   10  10
#2: -10  -10 -10
#3:  20  120   0
#4: -20 -120   0
#5:   0  100  30
#6:   0 -100 -30

However, I would probably melt the data.table:

data <- melt(data)
cols <- c("x", "z")
data[variable %in% cols & (value > 90 | value < -90), value := 0]
#    variable value
# 1:        x    10
# 2:        x   -10
# 3:        x    20
# 4:        x   -20
# 5:        x     0
# 6:        x     0
# 7:        y    10
# 8:        y   -10
# 9:        y   120
#10:        y  -120
#11:        y   100
#12:        y  -100
#13:        z    10
#14:        z   -10
#15:        z     0
#16:        z     0
#17:        z    30
#18:        z   -30

Upvotes: 5

akrun

Reputation: 887901

We can use set from data.table

for(j in names(data)[c(1,3)]){
  set(data, i = which(data[[j]]>90 |data[[j]] < -90), j = j, value = 0)
 }

data
#     x    y   z
#1:  10   10  10
#2: -10  -10 -10
#3:  20  120   0
#4: -20 -120   0
#5:   0  100  30
#6:   0 -100 -30

Upvotes: 2

Efficiently update multiple data.table columns based on a single condition

Answers (3)

Related Questions