CharlesLDN
CharlesLDN

Reputation: 181

R data frame. How to count the number of rows addressing multiple conditions?

In a data frame (patients database), I want to count the number of rows (number of patients) which address a specific condition, here the value of 3, at least one time (using the operator "or":"|"), among repeated assessments (in fact re do surgeries). This specific condition can happen one, two, three four times or more among the one, two three or more assessments. If the value of 3 is measured at least on time, the row (patient) should be count. Here is an modified extract of my data frame which has 62 columns and around 300 rows.

> df
    grade_chir_1 grade_chir_2 grade_chir_3 grade_d_chir
2              1          NaN            3            3
3              1          NaN          NaN          NaN
4            NaN            2          NaN          NaN
5              2          NaN          NaN          NaN
6              2            3            2            3
7              3          NaN          NaN          NaN
8              1          NaN            3          NaN
9              1          NaN          NaN          NaN
10             3            3          NaN          NaN
11             1            3            3          NaN
12             1          NaN          NaN          NaN
13             2            2          NaN          NaN
14             1          NaN          NaN          NaN
15             1            3            2            3
16             1          NaN          NaN          NaN

So far I only have only found this not very elegant way to do this:

count(datam$grade_chir_1 == 3 | datam$grade_chir_2==3 | datam$grade_chir_3==3 | datam$grade_d_chir==3)[1,2]

Which give me the right number I presumed, but in a not very nice fashion.

Any clue will be warmly welcome.

Thanks.

Upvotes: 0

Views: 3133

Answers (4)

CharlesLDN
CharlesLDN

Reputation: 181

Something better is:

table(datam$grade_chir_1 == 3 | datam$grade_chir_2==3 | datam$grade_chir_3==3 | datam$grade_d_chir==3)
TRUE 
10 

But still not satisfactory.

also this solution:

sum(datam$grade_chir_1 == 3 | datam$grade_chir_2==3 | datam$grade_chir_3==3 | datam$grade_d_chir==3, na.rm=T)
[1] 10

Upvotes: 0

ndr
ndr

Reputation: 1437

Or

datam <-read.table(header=T, stringsAsFactors = F, text='
       grade_chir_1 grade_chir_2 grade_chir_3 grade_d_chir
          1          NaN            3            3
          1          NaN          NaN          NaN
        NaN            2          NaN          NaN
          2          NaN          NaN          NaN
          2            3            2            3
          3          NaN          NaN          NaN
          1          NaN            3          NaN
          1          NaN          NaN          NaN
         3            3          NaN          NaN
         1            3            3          NaN
         1          NaN          NaN          NaN
         2            2          NaN          NaN
         1          NaN          NaN          NaN
         1            3            2            3
         1          NaN          NaN          NaN
        ')
datam
sum(rowSums(datam == 3, na.rm=TRUE) > 0)
[1] 7

Upvotes: 3

IRTFM
IRTFM

Reputation: 263301

Your use of multiple OR conditions suggested this method:

> sum( apply(datam, 1, function(x) any(x==3) ), na.rm=TRUE)
[1] 7

Upvotes: 0

tkmckenzie
tkmckenzie

Reputation: 1363

Maybe not the most elegant solution, but you can use sapply to get whether 3 is in each row, then use sum to count the number of rows that match that condition:

sapply(1:nrow(df), function(row) 3 %in% df[row,])
# [1]  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE
# [13] FALSE  TRUE FALSE
sum(sapply(1:nrow(df), function(row) 3 %in% df[row,]))
# [1] 7

Upvotes: 0

Related Questions