Triamus
Triamus

Reputation: 2505

R: Compute number of rows in data frame that have 0 colSums for specific columns using a function

I have a data frame with n rows and m columns where m > 30.

My first column is an age variable and the rest are medical conditions that are either on or off (binary).

Now I would like to compute the number of observations where none of the medical conditions is switched on i.e. the number of healthy patients. I thought I could use the rowSums function to count observations wherever the row sum is zero (of course excluding the age variable) but I tried some functions and did not succeed.

Here is an example how it could work but always involving a lot of AND / OR statements which is not practical. I was looking for a non-loop solution.

example <- as.data.frame(matrix(data=c(40,1,1,1,36,1,0,1,56,0,0,1,43,0,0,0), nrow=4, ncol=4, 
byrow=T, dimnames <- list(c("row1","row2","row3", "row4"),c("Age","x","y","z"))))

Two impractical alternatives to arrive at desired outcome:

nrow(subset(example, x==0 & y==0 & z==0))
table(example$x==0 & example$y==0 & example$z==0)

What I actually wanted is sth like this:

nrow(example[rowSums(example[,2:ncol(example)])==0])

Upvotes: 0

Views: 1151

Answers (2)

RHelp
RHelp

Reputation: 835

You just want the total numbers of observations/rows that satisfy this condition right? Then you can use -

nrow(example[example$x==0 & example$y==0 & example$z==0,])

Else, if you want to use rowSums, this will work -

nrow(example[rowSums(example[,2:4])==0,])

Upvotes: 0

CHP
CHP

Reputation: 17189

You can use

apply(example[, -1], MARGIN = 1, FUN = function(x) all(x == 0))
##  row1  row2  row3  row4 
## FALSE FALSE FALSE  TRUE 

Here you are applying FUN on every row of the example[,-1]. It gives you logical vector indicating which rows satisfy the condition that all of the variables in that row are equal to 0. You get this by using all function inside your FUN argument function.

You can use this result to get rows containing all healthy patients or those containing atleast 1 non healthy patient.

example[apply(example[, -1], MARGIN = 1, FUN = function(x) all(x == 0)), ]
##      Age x y z
## row4  43 0 0 0

example[!apply(example[, -1], MARGIN = 1, FUN = function(x) all(x == 0)), ]
##      Age x y z
## row1  40 1 1 1
## row2  36 1 0 1
## row3  56 0 0 1

And you can get number of healthy rows or otherwise as below

# healthy rows
sum(apply(example[, -1], MARGIN = 1, FUN = function(x) all(x == 0)))
## [1] 1


# rows with atleast one unhealthy condition
sum(!apply(example[, -1], MARGIN = 1, FUN = function(x) all(x == 0)))
## [1] 3

Upvotes: 2

Related Questions