dplyr filter columns with value 0 for all rows with unique combinations of other columns

Question

I have a dataframe that looks like this:

df <- tibble(date = c(2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01), 
             site = c("X", "X", "X", "X", "Z", "Z", "Z", "Z"), 
             treatment = c("a", "a", "b", "b", "a", "a", "b", "b"),
             species = c("vetch", "clover", "vetch", "clover", "vetch", "clover", "vetch", "clover"),
             frequency = c(0, 1, 1, 1 1, 0, 1, 0))

But with lots of dates and sites and treatments. What I want is to filter out observations where all frequencies of that species (across all treatments and dates) is 0 for that site. So in the above I want to remove clover at site "Z" because it did not occur at any treatment or date at that site, but I want to leave clover in site "X" because it did occur in one of the treatments. So I want:

tibble(date = c(2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01),
       site = c("X", "X", "X" "X", "Z", "Z"),
       treatment = c("a", "a", "b", "b", "a", "b"),
       species = c("vetch", "clover", "vetch", "clover", "vetch", "vetch")
       frequency = c(0, 1, 1, 1, 1, 1))

My first thought was to pivot_wider, select columns then pivot_longer again, but this didn't work because the clover column was still selected by having a 1 in site "X":

  df %>%
    pivot_wider(names_from = species, names_prefix = "spp.", values_from = frequency, values_fill = 0) %>%
    group_by(site) %>%
    select_if(~ !is.numeric(.) || sum(.) != 0) %>%
    pivot_longer(starts_with("spp."), names_to = "species", names_prefix = "spp.", values_to = "frequency") -> df

So I guess I need to filter instead, but I can't figure out how to do that.

Jonathan V. Sol&#243;rzano · Accepted Answer

An easy solution can be achieved by creating another column that contains the frequency of each species grouped by date, site and species (ignoring treatment). Then you can easily filter using this new column and afterwards eliminate it.

library(tidyverse)
df %>%
    # Group by date site and species
    group_by(date, site, species) %>%
    # Create new column that sums frequency values by grouping variables
    mutate(appears = sum(frequency)) %>%
    # ignore rows where appears = 0
    filter(appears != 0) %>%
    # Eliminate appears column
    select(-appears)

dplyr filter columns with value 0 for all rows with unique combinations of other columns

Answers (2)

Related Questions