millie0725
millie0725

Reputation: 371

Subsetting strings from a column if they match multiple strings in a different column

I have a dataframe in which I'd like to subset a column to only contain strings that match multiple strings in a different column. Here's some mock data:

df1 <- data.frame(species = c("Rufl","Rufl","Soca","Assp","Assp","Elre"),
                  state = c("warmed","ambient","warmed","warmed","ambient","ambient"))

I'd like have a dataframe with only species that match both the "warmed" and "ambient" states, removing species that only match one string, so the final dataframe would have "Rufl" and "Assp" with their given states, as shown below

species  state
Rufl     warmed
Rufl     ambient
Assp     warmed
Assp     ambient

I've tried a few different attempts at this, both with the subset function and dplyr, but can't figure out the right way to get this to work. Here's my failed attempts:

df2 <- subset(df1$species, state == "warmed" & state == "ambient")

# or this?
df2 <- df1 %>%
        group_by(species) %>%
        filter(state == "warmed",
               state == "ambient")

Thanks for the help!

Using R version 4.0.2, Mac OS X 10.13.6

Upvotes: 2

Views: 752

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101343

Another base R option using ave

subset(
  df1,
  ave(state, species, FUN = function(x) sum(c("warmed", "ambient") %in% x)) == 2
)

gives

  species   state
1    Rufl  warmed
2    Rufl ambient
4    Assp  warmed
5    Assp ambient

Upvotes: 0

akrun
akrun

Reputation: 887118

We need a group by all

library(dplyr)
df1 %>%
   group_by(species) %>% 
   filter(all(c('warmed', 'ambient') %in% state)) %>%
   ungroup

-output

# A tibble: 4 x 2
#  species state  
#  <chr>   <chr>  
#1 Rufl    warmed 
#2 Rufl    ambient
#3 Assp    warmed 
#4 Assp    ambient

The & operation doesn't work as the elements are not present in the same location


Or using subset

subset(df1, species %in% names(which(rowSums(table(df1) > 0) == 2)))

Upvotes: 1

Related Questions