mraj33
mraj33

Reputation: 13

how to apply a function to a certain portion of a filtered data in a data.table in r

I have a data.table and would like to do the following:

  1. filter the data based on multiple conditions
  2. apply a function to a "portion" of this filtered data on a single column
  3. modify/update other columns corresponding to the same "portion" of the filtered data
  4. modify/update the remaining portion of the other columns
  5. all updates done inplace of the data.table to keep the same dimensions

Here is an example: Data table:

dt <- data.table(ename = rep(c('a','b'),10),
                 tcode = rep(c(100,200,300,400),5),
                 tcdes = rep(c('EFG','HIJ','KLM','NGH'),5),
                 hours = rep(c(8),20))
# output
# ename tcode tcdes hours
# 1:     a   100   EFG     8
# 2:     b   200   HIJ     8
# 3:     a   300   KLM     8
# 4:     b   400   NGH     8
# 5:     a   100   EFG     8
# 6:     b   200   HIJ     8
# 7:     a   300   KLM     8
# 8:     b   400   NGH     8
# 9:     a   100   EFG     8
# 10:     b   200   HIJ     8
# 11:     a   300   KLM     8
# 12:     b   400   NGH     8
# 13:     a   100   EFG     8
# 14:     b   200   HIJ     8
# 15:     a   300   KLM     8
# 16:     b   400   NGH     8
# 17:     a   100   EFG     8
# 18:     b   200   HIJ     8
# 19:     a   300   KLM     8
# 20:     b   400   NGH     8

#1. Filter the data by multiple conditions:

dt[(ename == 'b'& tcode == 400),]

# output
#   ename tcode tcdes hours
#1:     b   400   NGH     8
#2:     b   400   NGH     8
#3:     b   400   NGH     8
#4:     b   400   NGH     8
#5:     b   400   NGH     8

  1. The function I would like to apply is:

    a. take a certain % (portion) of this filtered data (say 70% which will gives first 3.5 rows - so flooring this value to get 3 rows (rows: 1 to 3))

    b. multiply the hours column by 0.7 for this portion

  2. Modify the tcode and tcdes columns for this portion of data to:

    a. tcode = 230

    b. tcdes = "JKL"

  3. Modify the tcode and tcdes columns of the remaining portion (2 rows: 4 & 5) of the filtered data to:

    a. tcode = 340

    b. tcdes = "BVH"

5. The filtered result should look like:

dt[(ename == 'b'& tcode == 230 & tcode == 340),]

# output
#   ename tcode tcdes hours
#1:     b   230   JKL     5.6
#2:     b   230   JKL     5.6
#3:     b   230   JKL     5.6
#4:     b   340   BVH     8
#5:     b   340   BVH     8

I am new to R as a whole and will greatly appreciate the help. I have tried to use the lapply() with .SD, .SDcols and :=, := but cannot get the function to be applied to a portion and maintain the full data.table.

Many thanks.

Upvotes: 1

Views: 140

Answers (1)

akrun
akrun

Reputation: 887163

We could create a list or data.table with the values and then either do a join or use Map to replace the values

library(data.table)
newdt <- data.table(tcode = c(230, 340), tcdes = c("JKL", "BVH"), hours = c(0.7, 1))
dt[(ename == 'b'& tcode == 400),  names(newdt) := {
         i1 <- as.integer(0.7 * .N)
         i2 <- rep(1:2, c(i1, .N - i1))
          Map(function(x, y, z) if(z) x[i2] * y[i2] else y[i2],
             .SD, newdt, c(FALSE, FALSE, TRUE))
       }, .SDcols = names(newdt)]



dt[(ename == 'b'& tcode == 230 | tcode == 340)]
#   ename tcode tcdes hours
#1:     b   230   JKL   5.6
#2:     b   230   JKL   5.6
#3:     b   230   JKL   5.6
#4:     b   340   BVH   8.0
#5:     b   340   BVH   8.0

Upvotes: 0

Related Questions