Filter rows based upon 2 or more occurrences across columns in R

Question

I am attempting to filter out rows in which there is an occurrence only in a single column, and show only those that appear in more than one.

An example database:

	col1	col3	col5	col6	col7
row1	1	0	0	0	0
row2	6	42	0	0	0
row3	0	0	1	1	0
row4	0	0	0	0	87

In this case, I would want my filter to only return row2 and row3. I have been attempting to use dplyr and the filter feature; although I haven't found a way to capture all 7 columns. I'm unable to use rowSums() since the actual value isn't important in this case, only the frequency alongside another column.

I have thus far only accurately captured the output I am looking for when exclusively comparing two columns for values >0:

mydata.df %>%
  filter(col1>0 & col2>0)

Though it's easy to see why this approach isn't feasible, especially with 7 columns to compare. I have tried stringing multiple conditions to compare amongst multiple columns without success. Short of writing a separate filter for each of the combinations of columns independently, I'm sure I am missing an obvious and simple solution?

akrun · Accepted Answer

We can use filter with rowSums

library(dplyr)
mydata.df %>% 
   filter(rowSums(. > 0) >=2)
     col1 col2 col3 col4 col5 col6 col7
row2    6    0   42    0    0    0    0
row3    0    0    0    0    1    1    0

Or if it is pairwise combination of columns that we are checking

library(purrr)
mydata.df %>% 
    filter(combn(., 2, FUN = function(x) rowSums(x > 0) == 2, 
         simplify = FALSE) %>% 
      reduce(`|`))
     col1 col2 col3 col4 col5 col6 col7
row2    6    0   42    0    0    0    0
row3    0    0    0    0    1    1    0

data

mydata.df <- structure(list(col1 = c(1L, 6L, 0L, 0L), col2 = c(0L, 0L, 0L, 
0L), col3 = c(0L, 42L, 0L, 0L), col4 = c(0L, 0L, 0L, 0L), col5 = c(0L, 
0L, 1L, 0L), col6 = c(0L, 0L, 1L, 0L), col7 = c(0L, 0L, 0L, 87L
)), class = "data.frame", row.names = c("row1", "row2", "row3", 
"row4"))

Filter rows based upon 2 or more occurrences across columns in R

Answers (1)

data

Related Questions