user6508283
user6508283

Reputation:

Grouping and dividing between the columns

This question is a follow-up to Group to Group division

          date bal        D
 1: 1/31/2013  10       NA
 2: 1/31/2013  11       NA
 3: 1/31/2013  12       NA
 4: 1/31/2013  13       NA
 5: 1/31/2013  14       NA
 6: 2/28/2013  20       NA
 7: 2/28/2013  30 3.000000
 8: 2/28/2013  40 3.636364
 9: 2/28/2013  50 4.166667
10: 2/28/2013  60 4.615385
11: 3/30/2013  10       NA
12: 3/30/2013  11 0.550000
13: 3/30/2013  12 0.400000
14: 3/30/2013  13 0.325000
15: 3/30/2013  15 0.300000

as per the below:

library(data.table)   # CRAN version 1.10.4 used
setDT(bb)[, D := bal / shift(bal, 6L)][seq(1L, nrow(bb), 5L), D := NA][]

Now my questions is:

  1. at every 4th and 5th of each group, the answer should print 100%, that means, for 9th, 10th, 14th and 15th, and so on, the values under D should be 100%

  2. Values in D should be in %

expected o/p

       date     bal    D
 1: 1/31/2013  10      NA
 2: 1/31/2013  11      NA
 3: 1/31/2013  12      NA
 4: 1/31/2013  13 100.00
 5: 1/31/2013  14 100.00
 6: 2/28/2013  20      NA
 7: 2/28/2013  30 300.00
 8: 2/28/2013  40 363.64
 9: 2/28/2013  50 100.00
10: 2/28/2013  60 100.00
11: 3/30/2013  10      NA
12: 3/30/2013  11  55.00
13: 3/30/2013  12  40.00
14: 3/30/2013  13 100.00
15: 3/30/2013  15 100.00

thats the expected output.

Upvotes: 0

Views: 65

Answers (1)

Uwe
Uwe

Reputation: 42582

It is supposed that same conditions hold as in my previous answer, namely that there is always the same number of rows for each date. With this observation, a very simple solution is possible by just lagging the value of bal by 6 rows to get the denominator. As this ignores the groups in first place, it is necessary to set the result D to NA for the first row in each group, i.e. in every 5th row, finally.

The additional request that specific rows need to be manually overwritten with 1.0 (printed as 100%) is handled likewise by computing the respective indices.

library(data.table)
setDT(bb)[, D := formattable::percent(bal / shift(bal, 6L))][seq(1L, .N, 5L), D := NA][
  rep(seq(4L, nrow(bb), 5L), each = 2L) + 0:1, D := 1.0][]
         date bal       D
 1: 1/31/2013  10      NA
 2: 1/31/2013  11      NA
 3: 1/31/2013  12      NA
 4: 1/31/2013  13 100.00%
 5: 1/31/2013  14 100.00%
 6: 2/28/2013  20      NA
 7: 2/28/2013  30 300.00%
 8: 2/28/2013  40 363.64%
 9: 2/28/2013  50 100.00%
10: 2/28/2013  60 100.00%
11: 3/30/2013  10      NA
12: 3/30/2013  11  55.00%
13: 3/30/2013  12  40.00%
14: 3/30/2013  13 100.00%
15: 3/30/2013  15 100.00%

Note that the percent function from the formattable package is used. This has the advantage that the values are still numeric which can be used for calculation but are printed as percent.


By request of the OP, here is also a version without using formattable::percent():

setDT(bb)[, D := 100.0 * bal / shift(bal, 6L)][seq(1L, .N, 5L), D := NA][
  rep(seq(4L, nrow(bb), 5L), each = 2L) + 0:1, D := 100.0][]
         date bal        D
 1: 1/31/2013  10       NA
 2: 1/31/2013  11       NA
 3: 1/31/2013  12       NA
 4: 1/31/2013  13 100.0000
 5: 1/31/2013  14 100.0000
 6: 2/28/2013  20       NA
 7: 2/28/2013  30 300.0000
 8: 2/28/2013  40 363.6364
 9: 2/28/2013  50 100.0000
10: 2/28/2013  60 100.0000
11: 3/30/2013  10       NA
12: 3/30/2013  11  55.0000
13: 3/30/2013  12  40.0000
14: 3/30/2013  13 100.0000
15: 3/30/2013  15 100.0000

The OP has requested to have a dynamic version where the user can select which rows in each group will be 100. I have tried to make a full flexible version where also the number of elements in each group is dynamic (still required to be the same across all groups) and packageded this as a function:

divide_by_group <- function(DF, 
                            id_of_rows_in_group_to_override = NA, 
                            val_override = 100.0) {
  library(data.table)
  # check parameters
  checkmate::assert_data_frame(DF)
  checkmate::assert_names(c("date", "bal"), subset.of = names(DF))
  checkmate::assert_number(val_override)
  # retrieve group length, verify all groups have the same length
  l_grp <- setDT(DF)[, .N, by = date][
    , if (any(N != first(N))) stop("Differing group lengths") else first(N)]
  # verify user specified row ids
  checkmate::assert_integerish(id_of_rows_in_group_to_override, lower = 1L, upper = l_grp)
  # compute result
  result <- DF[, D := 100.0 * bal / shift(bal, l_grp + 1L)][seq(1L, .N, l_grp), D := NA]
  # apply override
  # compute rows
  rn <- c(outer(id_of_rows_in_group_to_override, seq(l_grp, nrow(DF) - l_grp, 5L), `+`))
  # verify rn is in range 
  checkmate::assert_integerish(rn, lower = l_grp + 1L, upper = nrow(DF))
  result[rn, D := val_override]
  return(result[])
}

Note that more than 50% of code is for checking parameters and assumptions.

Sample calls

divide_by_group(bb)
         date bal        D
 1: 1/31/2013  10       NA
 2: 1/31/2013  11       NA
 3: 1/31/2013  12       NA
 4: 1/31/2013  13       NA
 5: 1/31/2013  14       NA
 6: 2/28/2013  20       NA
 7: 2/28/2013  30 300.0000
 8: 2/28/2013  40 363.6364
 9: 2/28/2013  50 416.6667
10: 2/28/2013  60 461.5385
11: 3/30/2013  10       NA
12: 3/30/2013  11  55.0000
13: 3/30/2013  12  40.0000
14: 3/30/2013  13  32.5000
15: 3/30/2013  15  30.0000
divide_by_group(bb, 4:5)
         date bal        D
 1: 1/31/2013  10       NA
 2: 1/31/2013  11       NA
 3: 1/31/2013  12       NA
 4: 1/31/2013  13       NA
 5: 1/31/2013  14       NA
 6: 2/28/2013  20       NA
 7: 2/28/2013  30 300.0000
 8: 2/28/2013  40 363.6364
 9: 2/28/2013  50 100.0000
10: 2/28/2013  60 100.0000
11: 3/30/2013  10       NA
12: 3/30/2013  11  55.0000
13: 3/30/2013  12  40.0000
14: 3/30/2013  13 100.0000
15: 3/30/2013  15 100.0000
divide_by_group(bb, c(2, 5), -9.9)
         date bal        D
 1: 1/31/2013  10       NA
 2: 1/31/2013  11       NA
 3: 1/31/2013  12       NA
 4: 1/31/2013  13       NA
 5: 1/31/2013  14       NA
 6: 2/28/2013  20       NA
 7: 2/28/2013  30  -9.9000
 8: 2/28/2013  40 363.6364
 9: 2/28/2013  50 416.6667
10: 2/28/2013  60  -9.9000
11: 3/30/2013  10       NA
12: 3/30/2013  11  -9.9000
13: 3/30/2013  12  40.0000
14: 3/30/2013  13  32.5000
15: 3/30/2013  15  -9.9000

Upvotes: 2

Related Questions