Gaurav Singhal
Gaurav Singhal

Reputation: 1086

How to calculate specific percentages

I have a data table in which data is given at 4 different levels (level 0, 1, 2 and 3). I want to calculate how supply in level 3 is distributed in level 2 for each state. (I have kept occ_code in table so that even if state_code and level are same, it is a unique record)

Creating sample Table:

library(data.table)
state_code = c(rep(1,14))
level = c(0,1,2,3,3,2,3,1,2,3,3,3,2,3)
occ_code = LETTERS[1:14]
supply = c(100,60,50,25,25,10,10,40,30,10,10,10,10,10)    
DT = data.table(state_code,occ_code,level,supply)

Desired output

perc = c(NA,NA,NA,0.5,0.5,NA,1,NA,NA,0.33,0.33,0.33,NA,1)
DT2 = data.table(DT,perc)

Basically, I want to use these percentages to project another data which is given only at level 2.

Upvotes: 1

Views: 99

Answers (2)

Jaap
Jaap

Reputation: 83275

A possible solution:

DT[, rl := rleid(level), by = state_code
   ][level == 3, perc := supply/sum(supply), by = .(state_code, rl)
     ][, rl := NULL][]

which gives:

> DT
    state_code occ_code level supply      perc
 1:          1        A     0    100        NA
 2:          1        B     1     60        NA
 3:          1        C     2     50        NA
 4:          1        D     3     25 0.5000000
 5:          1        E     3     25 0.5000000
 6:          1        F     2     10        NA
 7:          1        G     3     10 1.0000000
 8:          2        H     1     40        NA
 9:          2        I     2     30        NA
10:          2        J     3     10 0.3333333
11:          2        K     3     10 0.3333333
12:          2        L     3     10 0.3333333
13:          2        M     2     10        NA
14:          2        N     3     10 1.0000000

Upvotes: 3

manotheshark
manotheshark

Reputation: 4357

Restructuring the data to store information for only level 3. The other information can be calculated from this:

library(data.table)
dt3 <- DT[level == 3, ]
dt3[, parent := c("2C", "2C", "2F", "2I", "2I", "2I", "2M")]
dt3[, perc := round(supply / sum(supply), 4), by = parent]

   state_code occ_code level supply parent   perc
1:          1        D     3     25     2C 0.5000
2:          1        E     3     25     2C 0.5000
3:          1        G     3     10     2F 1.0000
4:          2        J     3     10     2I 0.3333
5:          2        K     3     10     2I 0.3333
6:          2        L     3     10     2I 0.3333
7:          2        N     3     10     2M 1.0000

Calculating the supply for level 0, 1, and 2 respectively:

dt3[, sum(supply)]
dt3[, sum(supply), by = state_code]
dt3[, sum(supply), by = parent]

A second approach:

DT[level == 2, parent := paste0(level, occ_code)]
DT[level > 1, parent := parent[1], by = .(cumsum(!is.na(parent)))]
DT[level == 3, perc := round(supply / sum(supply), 4), by = parent]

    state_code occ_code level supply parent   perc
 1:          1        A     0    100     NA     NA
 2:          1        B     1     60     NA     NA
 3:          1        C     2     50     2C     NA
 4:          1        D     3     25     2C 0.5000
 5:          1        E     3     25     2C 0.5000
 6:          1        F     2     10     2F     NA
 7:          1        G     3     10     2F 1.0000
 8:          2        H     1     40     NA     NA
 9:          2        I     2     30     2I     NA
10:          2        J     3     10     2I 0.3333
11:          2        K     3     10     2I 0.3333
12:          2        L     3     10     2I 0.3333
13:          2        M     2     10     2M     NA
14:          2        N     3     10     2M 1.0000

Upvotes: 1

Related Questions