Reputation: 31

Check if a list contains any value from a vector in R

I want to create a binary column that shows whether another column, which is a list of characters, contains any value from a vector.

Specifically, I want to create a column that says whether one has experienced their manager leaving a company in the past year. For this, I have a all_manager column that is a list of all managers one had in the last year. And then, I have a terminated_managers vector that has all names of managers who have terminated in the past year.

df$all_manager

[[1]]
[1] John Mary

[[2]]
[1] Paul John

[[3]]
[1] Mary Tom Lilly

terminated_managers <- c("Mary", "Bill")

And I want to create manager_termed_yn column such that:

df$manager_termed_yn

[1]TRUE
[2]FALSE
[3]TRUE

I'll appreciate your help! First time posting, so apologies that the example is not the best.

Upvotes: 2

Answers (4)

AndrewGB

Reputation: 16836

Here is a tidyverse solution:

library(tidyverse)

df %>%
  rowwise %>%
  mutate(manager_termed_yn = any(unlist(all_manager) %in% terminated_managers))

Output

  id      all_manager manager_termed_yn
1  1       John, Mary              TRUE
2  2       Paul, John             FALSE
3  3 Mary, Tom, Lilly              TRUE

Data

df <- structure(list(id = 1:3, all_manager = list(c("John", "Mary"), 
    c("Paul", "John"), c("Mary", "Tom", "Lilly"))), row.names = c(NA, 
-3L), class = "data.frame")

terminated_managers <- c("Mary", "Bill")

Upvotes: 0

socialscientist

Reputation: 4232

Let's first create some example data to work with that looks similar to your own.

# Example data 
set.seed(123)
data <- replicate(5, list(paste0(sample(letters, size = 5, replace = T), collapse = "")))
data
#> [[1]]
#> [1] "osncj"
#> 
#> [[2]]
#> [1] "rvket"
#> 
#> [[3]]
#> [1] "nvyze"
#> 
#> [[4]]
#> [1] "syyic"
#> 
#> [[5]]
#> [1] "hzgji"

# Example vector
vec <- c("osn", "rvket", "foo")

Your data are currently stored as a list. From the looks of it, each element of the list has 1 name in it stored as a character vector. My guess about this is because for every row of the output you showed that starts with DOUBLE brackets (e.g. [[1]], [[2]], etc.), there is only one row below it that starts with single brackets (and the index of those are always [1]).

class(data)
#> [1] "list"

Given you have this data structure, you can convert this into a vector:

df2vec <- unlist(data)
df2vec
#> [1] "osncj" "rvket" "nvyze" "syyic" "hzgji"

We can determine if the values in df2vec exactly match values in vec:

df2vec %in% vec
#> [1] FALSE  TRUE FALSE FALSE FALSE

And here are the values that appear in both:

df2vec[df2vec %in% vec]
#> [1] "rvket"

If we want to look for partial matches instead that would, for example, return TRUE when matching "Rob" to "Robert Frost", we need substring pattern matching:

partials <- grepl(df2vec, pattern = paste(vec, collapse = "|"))

partials
#> [1]  TRUE  TRUE FALSE FALSE FALSE

df2vec[partials]
#> [1] "osncj" "rvket"

Upvotes: 0

Kra.P

Reputation: 15123

Let list l as

l <- list("John Mary","Paul John", "Mary Tom Lilly")
l

[[1]]
[1] "John Mary"

[[2]]
[1] "Paul John"

[[3]]
[1] "Mary Tom Lilly"

Then

sapply(l, function(x) {sum(match(str_split(x, " ", simplify = T), terminated_managers), na.rm = T) == 1}, simplify = T)

[1]  TRUE FALSE  TRUE

Upvotes: 0

Zheyuan Li

Reputation: 73265

all_manager <- list(c("John", "Mary"), c("Paul", "John"), c("Mary", "Tom", "Lilly"))
terminated_managers <- c("Mary", "Bill")

We can use

colSums(sapply(all_manager, "%in%", x = terminated_managers)) > 0
#[1]  TRUE FALSE  TRUE

Upvotes: 3

Check if a list contains any value from a vector in R

Answers (4)

Related Questions