Reputation: 13
I am attempting to filter out rows in which there is an occurrence only in a single column, and show only those that appear in more than one.
An example database:
col1 | col2 | col3 | col4 | col5 | col6 | col7 | |
---|---|---|---|---|---|---|---|
row1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
row2 | 6 | 0 | 42 | 0 | 0 | 0 | 0 |
row3 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
row4 | 0 | 0 | 0 | 0 | 0 | 0 | 87 |
In this case, I would want my filter to only return row2 and row3. I have been attempting to use dplyr and the filter feature; although I haven't found a way to capture all 7 columns. I'm unable to use rowSums()
since the actual value isn't important in this case, only the frequency alongside another column.
I have thus far only accurately captured the output I am looking for when exclusively comparing two columns for values >0:
mydata.df %>%
filter(col1>0 & col2>0)
Though it's easy to see why this approach isn't feasible, especially with 7 columns to compare. I have tried stringing multiple conditions to compare amongst multiple columns without success. Short of writing a separate filter for each of the combinations of columns independently, I'm sure I am missing an obvious and simple solution?
Upvotes: 1
Views: 1013
Reputation: 887981
We can use filter
with rowSums
library(dplyr)
mydata.df %>%
filter(rowSums(. > 0) >=2)
col1 col2 col3 col4 col5 col6 col7
row2 6 0 42 0 0 0 0
row3 0 0 0 0 1 1 0
Or if it is pairwise combination of columns that we are checking
library(purrr)
mydata.df %>%
filter(combn(., 2, FUN = function(x) rowSums(x > 0) == 2,
simplify = FALSE) %>%
reduce(`|`))
col1 col2 col3 col4 col5 col6 col7
row2 6 0 42 0 0 0 0
row3 0 0 0 0 1 1 0
mydata.df <- structure(list(col1 = c(1L, 6L, 0L, 0L), col2 = c(0L, 0L, 0L,
0L), col3 = c(0L, 42L, 0L, 0L), col4 = c(0L, 0L, 0L, 0L), col5 = c(0L,
0L, 1L, 0L), col6 = c(0L, 0L, 1L, 0L), col7 = c(0L, 0L, 0L, 87L
)), class = "data.frame", row.names = c("row1", "row2", "row3",
"row4"))
Upvotes: 1