Reputation: 13
I have a data.table and would like to do the following:
Here is an example: Data table:
dt <- data.table(ename = rep(c('a','b'),10),
tcode = rep(c(100,200,300,400),5),
tcdes = rep(c('EFG','HIJ','KLM','NGH'),5),
hours = rep(c(8),20))
# output
# ename tcode tcdes hours
# 1: a 100 EFG 8
# 2: b 200 HIJ 8
# 3: a 300 KLM 8
# 4: b 400 NGH 8
# 5: a 100 EFG 8
# 6: b 200 HIJ 8
# 7: a 300 KLM 8
# 8: b 400 NGH 8
# 9: a 100 EFG 8
# 10: b 200 HIJ 8
# 11: a 300 KLM 8
# 12: b 400 NGH 8
# 13: a 100 EFG 8
# 14: b 200 HIJ 8
# 15: a 300 KLM 8
# 16: b 400 NGH 8
# 17: a 100 EFG 8
# 18: b 200 HIJ 8
# 19: a 300 KLM 8
# 20: b 400 NGH 8
#1. Filter the data by multiple conditions:
dt[(ename == 'b'& tcode == 400),]
# output
# ename tcode tcdes hours
#1: b 400 NGH 8
#2: b 400 NGH 8
#3: b 400 NGH 8
#4: b 400 NGH 8
#5: b 400 NGH 8
The function I would like to apply is:
a. take a certain % (portion) of this filtered data (say 70% which will gives first 3.5 rows - so flooring this value to get 3 rows (rows: 1 to 3))
b. multiply the hours column by 0.7 for this portion
Modify the tcode and tcdes columns for this portion of data to:
a. tcode = 230
b. tcdes = "JKL"
Modify the tcode and tcdes columns of the remaining portion (2 rows: 4 & 5) of the filtered data to:
a. tcode = 340
b. tcdes = "BVH"
5. The filtered result should look like:
dt[(ename == 'b'& tcode == 230 & tcode == 340),]
# output
# ename tcode tcdes hours
#1: b 230 JKL 5.6
#2: b 230 JKL 5.6
#3: b 230 JKL 5.6
#4: b 340 BVH 8
#5: b 340 BVH 8
I am new to R as a whole and will greatly appreciate the help. I have tried to use the lapply() with .SD, .SDcols and :=, :=
but cannot get the function to be applied to a portion and maintain the full data.table.
Many thanks.
Upvotes: 1
Views: 140
Reputation: 887163
We could create a list
or data.table
with the values and then either do a join or use Map
to replace the values
library(data.table)
newdt <- data.table(tcode = c(230, 340), tcdes = c("JKL", "BVH"), hours = c(0.7, 1))
dt[(ename == 'b'& tcode == 400), names(newdt) := {
i1 <- as.integer(0.7 * .N)
i2 <- rep(1:2, c(i1, .N - i1))
Map(function(x, y, z) if(z) x[i2] * y[i2] else y[i2],
.SD, newdt, c(FALSE, FALSE, TRUE))
}, .SDcols = names(newdt)]
dt[(ename == 'b'& tcode == 230 | tcode == 340)]
# ename tcode tcdes hours
#1: b 230 JKL 5.6
#2: b 230 JKL 5.6
#3: b 230 JKL 5.6
#4: b 340 BVH 8.0
#5: b 340 BVH 8.0
Upvotes: 0