Reputation: 110
I am trying to calculate the percentage of one column over the other while taking into account both condition and day. End result is to have a row per condition, per day, with the percentage value. My current progress can be found below, but I am stuck in the final step. Any help is greatly appreciated.
Load data:
ID <-c(rep("A", 5), rep("B",6), rep("C",4))
Day <- c(1,1,1,2,2,1,1,1,2,2,2,1,1,1,2)
Results1 <- c("x","z","z","z","x","z","x","z","z","z","x","x","z","z","x")
Results2 <- c(1,0,0,1,1,1,2,1,1,1,1,1,1,0,1)
x <- data.table(ID, Day, Results1)
x
Calculate global percentage:
sum(x$Results1== "x") / (sum(x$Results1 == "x") + sum(x$Results1 == "z")) * 100
Attempt on calculating per day and condition:
a <- as.data.table(x)[, lapply(.SD, sum(x$Results1== "x") / (sum(x$Results1 == "x") + sum(x$Results1 == "z")) * 100), by .(x$ID, x$Day)]
Upvotes: 1
Views: 1998
Reputation: 6362
Don't use $
within data.table, as this calls the total data.table, not the group you are subsetting:
x[, .( (sum(Results1 == "x") / .N) * 100), by = .(ID, Day)]
ID Day V1
1: A 1 33.33333
2: A 2 50.00000
3: B 1 33.33333
4: B 2 33.33333
5: C 1 33.33333
6: C 2 100.00000
If you have more than one results column:
x[, .(lapply(.SD, function(col) {(sum(col == "x") / .N )* 100})), by = .(ID, Day)]
Upvotes: 4