Reputation: 181
In a data frame (patients database), I want to count the number of rows (number of patients) which address a specific condition, here the value of 3, at least one time (using the operator "or":"|"), among repeated assessments (in fact re do surgeries). This specific condition can happen one, two, three four times or more among the one, two three or more assessments. If the value of 3 is measured at least on time, the row (patient) should be count. Here is an modified extract of my data frame which has 62 columns and around 300 rows.
> df
grade_chir_1 grade_chir_2 grade_chir_3 grade_d_chir
2 1 NaN 3 3
3 1 NaN NaN NaN
4 NaN 2 NaN NaN
5 2 NaN NaN NaN
6 2 3 2 3
7 3 NaN NaN NaN
8 1 NaN 3 NaN
9 1 NaN NaN NaN
10 3 3 NaN NaN
11 1 3 3 NaN
12 1 NaN NaN NaN
13 2 2 NaN NaN
14 1 NaN NaN NaN
15 1 3 2 3
16 1 NaN NaN NaN
So far I only have only found this not very elegant way to do this:
count(datam$grade_chir_1 == 3 | datam$grade_chir_2==3 | datam$grade_chir_3==3 | datam$grade_d_chir==3)[1,2]
Which give me the right number I presumed, but in a not very nice fashion.
Any clue will be warmly welcome.
Thanks.
Upvotes: 0
Views: 3133
Reputation: 181
Something better is:
table(datam$grade_chir_1 == 3 | datam$grade_chir_2==3 | datam$grade_chir_3==3 | datam$grade_d_chir==3)
TRUE
10
But still not satisfactory.
also this solution:
sum(datam$grade_chir_1 == 3 | datam$grade_chir_2==3 | datam$grade_chir_3==3 | datam$grade_d_chir==3, na.rm=T)
[1] 10
Upvotes: 0
Reputation: 1437
Or
datam <-read.table(header=T, stringsAsFactors = F, text='
grade_chir_1 grade_chir_2 grade_chir_3 grade_d_chir
1 NaN 3 3
1 NaN NaN NaN
NaN 2 NaN NaN
2 NaN NaN NaN
2 3 2 3
3 NaN NaN NaN
1 NaN 3 NaN
1 NaN NaN NaN
3 3 NaN NaN
1 3 3 NaN
1 NaN NaN NaN
2 2 NaN NaN
1 NaN NaN NaN
1 3 2 3
1 NaN NaN NaN
')
datam
sum(rowSums(datam == 3, na.rm=TRUE) > 0)
[1] 7
Upvotes: 3
Reputation: 263301
Your use of multiple OR conditions suggested this method:
> sum( apply(datam, 1, function(x) any(x==3) ), na.rm=TRUE)
[1] 7
Upvotes: 0
Reputation: 1363
Maybe not the most elegant solution, but you can use sapply
to get whether 3 is in each row, then use sum
to count the number of rows that match that condition:
sapply(1:nrow(df), function(row) 3 %in% df[row,])
# [1] TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE TRUE TRUE FALSE FALSE
# [13] FALSE TRUE FALSE
sum(sapply(1:nrow(df), function(row) 3 %in% df[row,]))
# [1] 7
Upvotes: 0