(R) Filter rows based on string names if the only resulting match in another column is NA

Question

The title may sound kinda weird but I have found no way of better defining my issue.

Here an example data set:

test = data.frame(genus = c("Acicarpha", "Acicarpha", "Acicarpha", "Acicarpha", "Acisanthera", "Acisanthera", "Acisanthera", "Acisanthera", "Acmella", "Acmella"), sp1 = c("NA", "bonariensis", "bonariensis", "spathulata", NA, "variabilis", "variabilis", "variabilis", NA, NA))

As you can see, I have a few species names (genus+sp1) possible: Acicarpha NA, Acicarpha bonariensis, Acicarpha spathulata, Acisanthera variabilis, Acisanthera NA, and Acmella NA.

Here's the deal: I'm trying to select only the row related to Acmella NA since the only returning value on the sp1 column is NA. Other species also have NA, but they do not have only NA. How can I do this? I'm bashing my head.

Christopher Belanger · Accepted Answer

Here's some code that does what I think you're asking for. It has four steps:

Group the rows by genus.
Make a new column called all_sp1_na that is TRUE if all of each genus's sp1 observations are NA, FALSE otherwise (i.e. FALSE if at least one sp1 observation is not NA for that genus).
Filter for rows where all_sp1_na is true.
Remove the temporary column all_sp1_na.

library(tidyverse)

test %>%
  group_by(genus) %>%
  mutate(all_sp1_na = all(is.na(sp1))) %>%
  filter(all_sp1_na) %>%
  select(-all_sp1_na)

And it gives this result:

# A tibble: 2 x 2
# Groups:   genus [1]
  genus   sp1  
     
1 Acmella NA   
2 Acmella NA

Let me know if you're looking for something else.

(R) Filter rows based on string names if the only resulting match in another column is NA

Answers (2)

Related Questions