Reputation: 318
I have a dataframe that looks like this:
df <- tibble(date = c(2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01),
site = c("X", "X", "X", "X", "Z", "Z", "Z", "Z"),
treatment = c("a", "a", "b", "b", "a", "a", "b", "b"),
species = c("vetch", "clover", "vetch", "clover", "vetch", "clover", "vetch", "clover"),
frequency = c(0, 1, 1, 1 1, 0, 1, 0))
But with lots of dates and sites and treatments. What I want is to filter out observations where all frequencies of that species (across all treatments and dates) is 0 for that site. So in the above I want to remove clover at site "Z" because it did not occur at any treatment or date at that site, but I want to leave clover in site "X" because it did occur in one of the treatments. So I want:
tibble(date = c(2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01, 2020-01-01),
site = c("X", "X", "X" "X", "Z", "Z"),
treatment = c("a", "a", "b", "b", "a", "b"),
species = c("vetch", "clover", "vetch", "clover", "vetch", "vetch")
frequency = c(0, 1, 1, 1, 1, 1))
My first thought was to pivot_wider, select columns then pivot_longer again, but this didn't work because the clover column was still selected by having a 1 in site "X":
df %>%
pivot_wider(names_from = species, names_prefix = "spp.", values_from = frequency, values_fill = 0) %>%
group_by(site) %>%
select_if(~ !is.numeric(.) || sum(.) != 0) %>%
pivot_longer(starts_with("spp."), names_to = "species", names_prefix = "spp.", values_to = "frequency") -> df
So I guess I need to filter instead, but I can't figure out how to do that.
Upvotes: 0
Views: 1919
Reputation: 388817
Maybe not for this dataset but generally using sum
might not be the right approach since if you have negative numbers it might cancel it out and you'll get wrong groups removed. You can use all
or any
:
With dplyr
:
library(dplyr)
df %>% group_by(date, site, species) %>% filter(any(frequency != 0))
#Also
#df %>% group_by(date, site, species) %>% filter(!all(frequency == 0))
# date site treatment species frequency
# <dbl> <chr> <chr> <chr> <dbl>
#1 2018 X a vetch 0
#2 2018 X a clover 1
#3 2018 X b vetch 1
#4 2018 X b clover 1
#5 2018 Z a vetch 1
#6 2018 Z b vetch 1
The same can be done in data.table
as well :
library(data.table)
setDT(df)[, .SD[any(frequency != 0)], .(date, site, species)]
Or in base R :
subset(df, ave(frequency != 0, date, site, species, FUN = any))
Upvotes: 4
Reputation: 5620
An easy solution can be achieved by creating another column that contains the frequency of each species grouped by date, site and species (ignoring treatment). Then you can easily filter using this new column and afterwards eliminate it.
library(tidyverse)
df %>%
# Group by date site and species
group_by(date, site, species) %>%
# Create new column that sums frequency values by grouping variables
mutate(appears = sum(frequency)) %>%
# ignore rows where appears = 0
filter(appears != 0) %>%
# Eliminate appears column
select(-appears)
Upvotes: 3