Reputation: 1086
I have a data table in which data is given at 4 different levels (level 0, 1, 2 and 3). I want to calculate how supply in level 3 is distributed in level 2 for each state. (I have kept occ_code
in table so that even if state_code
and level
are same, it is a unique record)
Creating sample Table:
library(data.table)
state_code = c(rep(1,14))
level = c(0,1,2,3,3,2,3,1,2,3,3,3,2,3)
occ_code = LETTERS[1:14]
supply = c(100,60,50,25,25,10,10,40,30,10,10,10,10,10)
DT = data.table(state_code,occ_code,level,supply)
Desired output
perc = c(NA,NA,NA,0.5,0.5,NA,1,NA,NA,0.33,0.33,0.33,NA,1)
DT2 = data.table(DT,perc)
Basically, I want to use these percentages to project another data which is given only at level 2.
Upvotes: 1
Views: 99
Reputation: 83275
A possible solution:
DT[, rl := rleid(level), by = state_code
][level == 3, perc := supply/sum(supply), by = .(state_code, rl)
][, rl := NULL][]
which gives:
> DT state_code occ_code level supply perc 1: 1 A 0 100 NA 2: 1 B 1 60 NA 3: 1 C 2 50 NA 4: 1 D 3 25 0.5000000 5: 1 E 3 25 0.5000000 6: 1 F 2 10 NA 7: 1 G 3 10 1.0000000 8: 2 H 1 40 NA 9: 2 I 2 30 NA 10: 2 J 3 10 0.3333333 11: 2 K 3 10 0.3333333 12: 2 L 3 10 0.3333333 13: 2 M 2 10 NA 14: 2 N 3 10 1.0000000
Upvotes: 3
Reputation: 4357
Restructuring the data to store information for only level 3. The other information can be calculated from this:
library(data.table)
dt3 <- DT[level == 3, ]
dt3[, parent := c("2C", "2C", "2F", "2I", "2I", "2I", "2M")]
dt3[, perc := round(supply / sum(supply), 4), by = parent]
state_code occ_code level supply parent perc
1: 1 D 3 25 2C 0.5000
2: 1 E 3 25 2C 0.5000
3: 1 G 3 10 2F 1.0000
4: 2 J 3 10 2I 0.3333
5: 2 K 3 10 2I 0.3333
6: 2 L 3 10 2I 0.3333
7: 2 N 3 10 2M 1.0000
Calculating the supply
for level
0, 1, and 2 respectively:
dt3[, sum(supply)]
dt3[, sum(supply), by = state_code]
dt3[, sum(supply), by = parent]
A second approach:
DT[level == 2, parent := paste0(level, occ_code)]
DT[level > 1, parent := parent[1], by = .(cumsum(!is.na(parent)))]
DT[level == 3, perc := round(supply / sum(supply), 4), by = parent]
state_code occ_code level supply parent perc
1: 1 A 0 100 NA NA
2: 1 B 1 60 NA NA
3: 1 C 2 50 2C NA
4: 1 D 3 25 2C 0.5000
5: 1 E 3 25 2C 0.5000
6: 1 F 2 10 2F NA
7: 1 G 3 10 2F 1.0000
8: 2 H 1 40 NA NA
9: 2 I 2 30 2I NA
10: 2 J 3 10 2I 0.3333
11: 2 K 3 10 2I 0.3333
12: 2 L 3 10 2I 0.3333
13: 2 M 2 10 2M NA
14: 2 N 3 10 2M 1.0000
Upvotes: 1