WalterB
WalterB

Reputation: 110

Calculate percentage with two variables

I am trying to calculate the percentage of one column over the other while taking into account both condition and day. End result is to have a row per condition, per day, with the percentage value. My current progress can be found below, but I am stuck in the final step. Any help is greatly appreciated.

Load data:

ID <-c(rep("A", 5), rep("B",6), rep("C",4))
Day <- c(1,1,1,2,2,1,1,1,2,2,2,1,1,1,2)
Results1 <- c("x","z","z","z","x","z","x","z","z","z","x","x","z","z","x")
Results2 <- c(1,0,0,1,1,1,2,1,1,1,1,1,1,0,1)

x <- data.table(ID, Day, Results1)
x

Calculate global percentage:

sum(x$Results1== "x") / (sum(x$Results1 == "x") + sum(x$Results1 == "z")) * 100

Attempt on calculating per day and condition:

a <- as.data.table(x)[, lapply(.SD, sum(x$Results1== "x") / (sum(x$Results1 == "x") + sum(x$Results1 == "z")) * 100), by .(x$ID, x$Day)]

Upvotes: 1

Views: 1998

Answers (1)

Chris
Chris

Reputation: 6362

Don't use $ within data.table, as this calls the total data.table, not the group you are subsetting:

x[, .( (sum(Results1 == "x") / .N) * 100), by = .(ID, Day)]
   ID Day        V1
1:  A   1  33.33333
2:  A   2  50.00000
3:  B   1  33.33333
4:  B   2  33.33333
5:  C   1  33.33333
6:  C   2 100.00000

If you have more than one results column:

x[, .(lapply(.SD, function(col) {(sum(col == "x") / .N )* 100})), by = .(ID, Day)]

Upvotes: 4

Related Questions