Christopher Dean
Christopher Dean

Reputation: 89

Checking for membership in a column of list vectors

Suppose I have a data.frame in the following format:

Site     CowId    Result
FarmA    1000     c("Aerococcus viridans", "Staphylococcus chromogenes")
FarmA    1001     Staphylococcus aureus
FarmA    1002     Contaminated

How can I check if Staphylococcus chromogenes is a member within any of the sets without unnesting any potential vectors within the Result column?

df <- structure(list(Site = structure(c(1L, 1L, 1L), .Label = "FarmA", class = "factor"), CowId = 1000:1002, Result = list(c("Aerococcus viridans", "Staphylococcus chromogenes"), "Staphylococcus aureus", "Contaminated")), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L), groups = structure(list( Site = structure(c(1L, 1L, 1L), .Label = "FarmA", class = "factor"), CowId = 1000:1002, .rows = structure(list(1L, 2L, 3L), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame" ), row.names = c(NA, -3L), .drop = TRUE))

Upvotes: 5

Views: 143

Answers (4)

PaulS
PaulS

Reputation: 25323

Another possible solution, based on dplyr:

library(dplyr)

df %>% 
  rowwise %>% 
  mutate(Presence = "Staphylococcus chromogenes" %in% Result)

#> # A tibble: 3 × 4
#> # Rowwise:  Site, CowId
#>   Site  CowId Result    Presence
#>   <fct> <int> <list>    <lgl>   
#> 1 FarmA  1000 <chr [2]> TRUE    
#> 2 FarmA  1001 <chr [1]> FALSE   
#> 3 FarmA  1002 <chr [1]> FALSE

Upvotes: 4

AndrewGB
AndrewGB

Reputation: 16836

Another option is to use tidyverse. Here, I mutate a new column so that you can see where the taxa occurs. I use str_detect within map to check to see if the string occurs within a given list, then return TRUE if the string occurs at all in a given list (i.e., using any).

library(tidyverse)

df %>%
  mutate(taxa_present = map_lgl(Result, function(v)
    str_detect(v, "Staphylococcus chromogenes") %>% any()))

Output

# A tibble: 3 × 4
# Groups:   Site, CowId [3]
  Site  CowId Result    taxa_present
  <fct> <int> <list>    <lgl>       
1 FarmA  1000 <chr [2]> TRUE        
2 FarmA  1001 <chr [1]> FALSE       
3 FarmA  1002 <chr [1]> FALSE  

Or if you just want a simple logical vector, then you could just do:

map_lgl(df$Result, function(v)
  str_detect(v, "Staphylococcus chromogenes") %>% any())

#[1]  TRUE FALSE FALSE

Upvotes: 3

ThomasIsCoding
ThomasIsCoding

Reputation: 101064

Try grepl + toString

> grepl("Staphylococcus chromogenes", sapply(df$Result, toString), fixed = TRUE)
[1]  TRUE FALSE FALSE

Upvotes: 4

Dave2e
Dave2e

Reputation: 24069

You could use lapply/sapply to test the string on all of the members of df$Result.

testString <-"Staphylococcus chromogenes"
sapply(df$Result, function(results){testString %in% results})
#[1]  TRUE FALSE FALSE

Upvotes: 6

Related Questions