Deb
Deb

Reputation: 539

How to count number of rows in data.table using same condition on multiple columns in R

If I have the below data table in R

dt <- data.table(x = c("No", "No", "Yes")
              ,  y = c("No", "Yes", "Yes")
              ,  z = c("No", "No", "No")
              ,  q = c("Hi", "Where, "When")
              ,  AA = c(1, 1,1)
              ,  ID = c(999, 789, 567)
              , bop = c(2345, 2222,2222))

and I want to get the count of "No" in each variable(x, y, z) in below format

Variable Type Count

 x     No      2

 y     No     1

 z     No     3

I am not able to pass the same filter to the columns I want. I can only pass filter to one column

dt[, .(.N), by = .(x,y,z)]


Upvotes: 1

Views: 1464

Answers (1)

akrun
akrun

Reputation: 887891

We can use melt to reshape into 'long' format, and then do a group by 'variable' (i.e. the column name column), get the number of 'No' by summing the logical vector (value == 'No')

library(data.table)
melt(dt, id.var = 'q')[, .(Type = 'No', Count = sum(value == 'No')), .(variable)]
#   variable Type Count
#1:        x   No     2
#2:        y   No     1
#3:        z   No     3

For the new example

melt(dt[, .(x, y, z)], measure = c('x', 'y', 'z'))[, 
   .(Type = 'No', Count = sum(value == 'No')), .(variable)]
#    variable Type Count
#1:        x   No     2
#2:        y   No     1
#3:        z   No     3

Or specify the logical condition in i and use .N

melt(dt[, .(x, y, z)], measure = c('x', 'y', 'z'))[value == 'No',
       .(Type = first(value), Count = .N), variable]
#   variable Type Count
#1:        x   No     2
#2:        y   No     1
#3:        z   No     3

Or without melting we can loop over the Subset of Data.table (.SD), create a logical vector and reduce the list to a single vector with Reduce

dt[, .(variable = names(.SD), Type = 'No', 
     Count = Reduce(`+`, lapply(.SD, `==`, 'No'))), .SDcols = x:z]
#  variable Type Count
#1:        x   No     3
#2:        y   No     2
#3:        z   No     1

Or using rowSums

dt[, .(Variable = names(.SD), Type = 'No',
        Count = rowSums(.SD == 'No')), .SDcols = x:z]

In the OP's code, the grouping columns were taken as 'x', 'y', 'z'. Then, if we do .N, it is looking for the number of rows for each unique set of values across those columns and this will be 1 for each case because there is only 1 row for No No No or No Yes No or Yes Yes No

Upvotes: 2

Related Questions