Reputation: 2505
I have a data frame with n
rows and m
columns where m > 30
.
My first column is an age
variable and the rest are medical conditions that are either on or off (binary).
Now I would like to compute the number of observations where none of the medical conditions is switched on i.e. the number of healthy patients. I thought I could use the rowSums
function to count observations wherever the row sum is zero (of course excluding the age variable) but I tried some functions and did not succeed.
Here is an example how it could work but always involving a lot of AND / OR statements which is not practical. I was looking for a non-loop solution.
example <- as.data.frame(matrix(data=c(40,1,1,1,36,1,0,1,56,0,0,1,43,0,0,0), nrow=4, ncol=4,
byrow=T, dimnames <- list(c("row1","row2","row3", "row4"),c("Age","x","y","z"))))
Two impractical alternatives to arrive at desired outcome:
nrow(subset(example, x==0 & y==0 & z==0))
table(example$x==0 & example$y==0 & example$z==0)
What I actually wanted is sth like this:
nrow(example[rowSums(example[,2:ncol(example)])==0])
Upvotes: 0
Views: 1151
Reputation: 835
You just want the total numbers of observations/rows that satisfy this condition right? Then you can use -
nrow(example[example$x==0 & example$y==0 & example$z==0,])
Else, if you want to use rowSums, this will work -
nrow(example[rowSums(example[,2:4])==0,])
Upvotes: 0
Reputation: 17189
You can use
apply(example[, -1], MARGIN = 1, FUN = function(x) all(x == 0))
## row1 row2 row3 row4
## FALSE FALSE FALSE TRUE
Here you are applying FUN
on every row of the example[,-1]
. It gives you logical vector indicating which rows satisfy the condition that all of the variables in that row are equal to 0. You get this by using all
function inside your FUN
argument function.
You can use this result to get rows containing all healthy patients or those containing atleast 1 non healthy patient.
example[apply(example[, -1], MARGIN = 1, FUN = function(x) all(x == 0)), ]
## Age x y z
## row4 43 0 0 0
example[!apply(example[, -1], MARGIN = 1, FUN = function(x) all(x == 0)), ]
## Age x y z
## row1 40 1 1 1
## row2 36 1 0 1
## row3 56 0 0 1
And you can get number of healthy rows or otherwise as below
# healthy rows
sum(apply(example[, -1], MARGIN = 1, FUN = function(x) all(x == 0)))
## [1] 1
# rows with atleast one unhealthy condition
sum(!apply(example[, -1], MARGIN = 1, FUN = function(x) all(x == 0)))
## [1] 3
Upvotes: 2