Count the number of cases in two of several categories in R?

Question

I have a dataset that describes a sample of people and the number and type of diseases they have. Here, 1 means that the person has the disease and 0 means that the person does not have the disease. NA denotes missing values. It looks something like this:

library(tidyverse)

df <- tribble(
    ~Heart_disease, ~Lung_disease, ~Bowel_disease, ~Nerve_disease, ~Liver_disease
    , 0, 1, 0, 1, 0
    , NA, 0, 0, 0, 0
    , 1, 1, 1, 1, 0
    , 0, 1, 0, 0, 1
    , 1, 0, 0, 1, 0
    , 0, 0, 1, NA, NA
    , 1, 0, 0, 0, 0
    , 0, 0, 1, 0, 1
    , 0, 0, 0, 0, 0
    , 0, 1, 1, 1, 1
)

   Heart_disease Lung_disease Bowel_disease Nerve_disease Liver_disease
                                              
 1             0            1             0             1             0
 2            NA            0             0             0             0
 3             1            1             1             1             0
 4             0            1             0             0             1
 5             1            0             0             1             0
 6             0            0             1            NA            NA
 7             1            0             0             0             0
 8             0            0             1             0             1
 9             0            0             0             0             0
10             0            1             1             1             1

I would like to know: a) How many people have two diseases? b) How many people have three or more diseases?

How could I calculate this using R?

Many thanks for your help

jazzurro · Accepted Answer

Here is one way. I think each row number (row name) represents a person. You want to get the sum of row with rowSums(). One you have that, you can aggregate the data. I counted how many rows have 2 in the column, total. I did the similar for the other condition.

library(dplyr)

mutate(mydf, total = rowSums(mydf, na.rm = T)) %>% 
summarize(two = sum(total == 2), morethan3 = sum(total >= 3))

#  two morethan3
#1   4         2

DATA

mydf <- structure(list(Heart_disease = c(0L, NA, 1L, 0L, 1L, 0L, 1L, 
0L, 0L, 0L), Lung_disease = c(1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 
0L, 1L), Bowel_disease = c(0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 
1L), Nerve_disease = c(1L, 0L, 1L, 0L, 1L, NA, 0L, 0L, 0L, 1L
), Liver_disease = c(0L, 0L, 0L, 1L, 0L, NA, 0L, 1L, 0L, 1L)), class = 
"data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))

Count the number of cases in two of several categories in R?

Answers (2)

Related Questions