Reputation:
This question is a follow-up to Group to Group division
date bal D
1: 1/31/2013 10 NA
2: 1/31/2013 11 NA
3: 1/31/2013 12 NA
4: 1/31/2013 13 NA
5: 1/31/2013 14 NA
6: 2/28/2013 20 NA
7: 2/28/2013 30 3.000000
8: 2/28/2013 40 3.636364
9: 2/28/2013 50 4.166667
10: 2/28/2013 60 4.615385
11: 3/30/2013 10 NA
12: 3/30/2013 11 0.550000
13: 3/30/2013 12 0.400000
14: 3/30/2013 13 0.325000
15: 3/30/2013 15 0.300000
as per the below:
library(data.table) # CRAN version 1.10.4 used
setDT(bb)[, D := bal / shift(bal, 6L)][seq(1L, nrow(bb), 5L), D := NA][]
Now my questions is:
at every 4th and 5th of each group, the answer should print 100%, that means, for 9th, 10th, 14th and 15th, and so on, the values under D should be 100%
Values in D should be in %
expected o/p
date bal D
1: 1/31/2013 10 NA
2: 1/31/2013 11 NA
3: 1/31/2013 12 NA
4: 1/31/2013 13 100.00
5: 1/31/2013 14 100.00
6: 2/28/2013 20 NA
7: 2/28/2013 30 300.00
8: 2/28/2013 40 363.64
9: 2/28/2013 50 100.00
10: 2/28/2013 60 100.00
11: 3/30/2013 10 NA
12: 3/30/2013 11 55.00
13: 3/30/2013 12 40.00
14: 3/30/2013 13 100.00
15: 3/30/2013 15 100.00
thats the expected output.
Upvotes: 0
Views: 65
Reputation: 42582
It is supposed that same conditions hold as in my previous answer, namely that there is always the same number of rows for each date. With this observation, a very simple solution is possible by just lagging the value of bal
by 6 rows to get the denominator. As this ignores the groups in first place, it is necessary to set the result D
to NA
for the first row in each group, i.e. in every 5th row, finally.
The additional request that specific rows need to be manually overwritten with 1.0
(printed as 100%
) is handled likewise by computing the respective indices.
library(data.table)
setDT(bb)[, D := formattable::percent(bal / shift(bal, 6L))][seq(1L, .N, 5L), D := NA][
rep(seq(4L, nrow(bb), 5L), each = 2L) + 0:1, D := 1.0][]
date bal D 1: 1/31/2013 10 NA 2: 1/31/2013 11 NA 3: 1/31/2013 12 NA 4: 1/31/2013 13 100.00% 5: 1/31/2013 14 100.00% 6: 2/28/2013 20 NA 7: 2/28/2013 30 300.00% 8: 2/28/2013 40 363.64% 9: 2/28/2013 50 100.00% 10: 2/28/2013 60 100.00% 11: 3/30/2013 10 NA 12: 3/30/2013 11 55.00% 13: 3/30/2013 12 40.00% 14: 3/30/2013 13 100.00% 15: 3/30/2013 15 100.00%
Note that the percent
function from the formattable
package is used. This has the advantage that the values are still numeric which can be used for calculation but are printed as percent.
By request of the OP, here is also a version without using formattable::percent()
:
setDT(bb)[, D := 100.0 * bal / shift(bal, 6L)][seq(1L, .N, 5L), D := NA][
rep(seq(4L, nrow(bb), 5L), each = 2L) + 0:1, D := 100.0][]
date bal D 1: 1/31/2013 10 NA 2: 1/31/2013 11 NA 3: 1/31/2013 12 NA 4: 1/31/2013 13 100.0000 5: 1/31/2013 14 100.0000 6: 2/28/2013 20 NA 7: 2/28/2013 30 300.0000 8: 2/28/2013 40 363.6364 9: 2/28/2013 50 100.0000 10: 2/28/2013 60 100.0000 11: 3/30/2013 10 NA 12: 3/30/2013 11 55.0000 13: 3/30/2013 12 40.0000 14: 3/30/2013 13 100.0000 15: 3/30/2013 15 100.0000
The OP has requested to have a dynamic version where the user can select which rows in each group will be 100. I have tried to make a full flexible version where also the number of elements in each group is dynamic (still required to be the same across all groups) and packageded this as a function:
divide_by_group <- function(DF,
id_of_rows_in_group_to_override = NA,
val_override = 100.0) {
library(data.table)
# check parameters
checkmate::assert_data_frame(DF)
checkmate::assert_names(c("date", "bal"), subset.of = names(DF))
checkmate::assert_number(val_override)
# retrieve group length, verify all groups have the same length
l_grp <- setDT(DF)[, .N, by = date][
, if (any(N != first(N))) stop("Differing group lengths") else first(N)]
# verify user specified row ids
checkmate::assert_integerish(id_of_rows_in_group_to_override, lower = 1L, upper = l_grp)
# compute result
result <- DF[, D := 100.0 * bal / shift(bal, l_grp + 1L)][seq(1L, .N, l_grp), D := NA]
# apply override
# compute rows
rn <- c(outer(id_of_rows_in_group_to_override, seq(l_grp, nrow(DF) - l_grp, 5L), `+`))
# verify rn is in range
checkmate::assert_integerish(rn, lower = l_grp + 1L, upper = nrow(DF))
result[rn, D := val_override]
return(result[])
}
Note that more than 50% of code is for checking parameters and assumptions.
Sample calls
divide_by_group(bb)
date bal D 1: 1/31/2013 10 NA 2: 1/31/2013 11 NA 3: 1/31/2013 12 NA 4: 1/31/2013 13 NA 5: 1/31/2013 14 NA 6: 2/28/2013 20 NA 7: 2/28/2013 30 300.0000 8: 2/28/2013 40 363.6364 9: 2/28/2013 50 416.6667 10: 2/28/2013 60 461.5385 11: 3/30/2013 10 NA 12: 3/30/2013 11 55.0000 13: 3/30/2013 12 40.0000 14: 3/30/2013 13 32.5000 15: 3/30/2013 15 30.0000
divide_by_group(bb, 4:5)
date bal D 1: 1/31/2013 10 NA 2: 1/31/2013 11 NA 3: 1/31/2013 12 NA 4: 1/31/2013 13 NA 5: 1/31/2013 14 NA 6: 2/28/2013 20 NA 7: 2/28/2013 30 300.0000 8: 2/28/2013 40 363.6364 9: 2/28/2013 50 100.0000 10: 2/28/2013 60 100.0000 11: 3/30/2013 10 NA 12: 3/30/2013 11 55.0000 13: 3/30/2013 12 40.0000 14: 3/30/2013 13 100.0000 15: 3/30/2013 15 100.0000
divide_by_group(bb, c(2, 5), -9.9)
date bal D 1: 1/31/2013 10 NA 2: 1/31/2013 11 NA 3: 1/31/2013 12 NA 4: 1/31/2013 13 NA 5: 1/31/2013 14 NA 6: 2/28/2013 20 NA 7: 2/28/2013 30 -9.9000 8: 2/28/2013 40 363.6364 9: 2/28/2013 50 416.6667 10: 2/28/2013 60 -9.9000 11: 3/30/2013 10 NA 12: 3/30/2013 11 -9.9000 13: 3/30/2013 12 40.0000 14: 3/30/2013 13 32.5000 15: 3/30/2013 15 -9.9000
Upvotes: 2