Reputation: 31
I want to create a binary column that shows whether another column, which is a list of characters, contains any value from a vector.
Specifically, I want to create a column that says whether one has experienced their manager leaving a company in the past year. For this, I have a all_manager
column that is a list of all managers one had in the last year. And then, I have a terminated_managers
vector that has all names of managers who have terminated in the past year.
df$all_manager
[[1]]
[1] John Mary
[[2]]
[1] Paul John
[[3]]
[1] Mary Tom Lilly
terminated_managers <- c("Mary", "Bill")
And I want to create manager_termed_yn
column such that:
df$manager_termed_yn
[1]TRUE
[2]FALSE
[3]TRUE
I'll appreciate your help! First time posting, so apologies that the example is not the best.
Upvotes: 2
Views: 2200
Reputation: 16836
Here is a tidyverse
solution:
library(tidyverse)
df %>%
rowwise %>%
mutate(manager_termed_yn = any(unlist(all_manager) %in% terminated_managers))
Output
id all_manager manager_termed_yn
1 1 John, Mary TRUE
2 2 Paul, John FALSE
3 3 Mary, Tom, Lilly TRUE
Data
df <- structure(list(id = 1:3, all_manager = list(c("John", "Mary"),
c("Paul", "John"), c("Mary", "Tom", "Lilly"))), row.names = c(NA,
-3L), class = "data.frame")
terminated_managers <- c("Mary", "Bill")
Upvotes: 0
Reputation: 4232
Let's first create some example data to work with that looks similar to your own.
# Example data
set.seed(123)
data <- replicate(5, list(paste0(sample(letters, size = 5, replace = T), collapse = "")))
data
#> [[1]]
#> [1] "osncj"
#>
#> [[2]]
#> [1] "rvket"
#>
#> [[3]]
#> [1] "nvyze"
#>
#> [[4]]
#> [1] "syyic"
#>
#> [[5]]
#> [1] "hzgji"
# Example vector
vec <- c("osn", "rvket", "foo")
Your data are currently stored as a list. From the looks of it, each element of the list has 1 name in it stored as a character vector
. My guess about this is because for every row of the output you showed that starts with DOUBLE brackets (e.g. [[1]]
, [[2]]
, etc.), there is only one row below it that starts with single brackets (and the index of those are always [1]
).
class(data)
#> [1] "list"
Given you have this data structure, you can convert this into a vector:
df2vec <- unlist(data)
df2vec
#> [1] "osncj" "rvket" "nvyze" "syyic" "hzgji"
We can determine if the values in df2vec exactly match values in vec:
df2vec %in% vec
#> [1] FALSE TRUE FALSE FALSE FALSE
And here are the values that appear in both:
df2vec[df2vec %in% vec]
#> [1] "rvket"
If we want to look for partial matches instead that would, for example, return TRUE
when matching "Rob"
to "Robert Frost"
, we need substring pattern matching:
partials <- grepl(df2vec, pattern = paste(vec, collapse = "|"))
partials
#> [1] TRUE TRUE FALSE FALSE FALSE
df2vec[partials]
#> [1] "osncj" "rvket"
Upvotes: 0
Reputation: 15123
Let list l
as
l <- list("John Mary","Paul John", "Mary Tom Lilly")
l
[[1]]
[1] "John Mary"
[[2]]
[1] "Paul John"
[[3]]
[1] "Mary Tom Lilly"
Then
sapply(l, function(x) {sum(match(str_split(x, " ", simplify = T), terminated_managers), na.rm = T) == 1}, simplify = T)
[1] TRUE FALSE TRUE
Upvotes: 0
Reputation: 73265
all_manager <- list(c("John", "Mary"), c("Paul", "John"), c("Mary", "Tom", "Lilly"))
terminated_managers <- c("Mary", "Bill")
We can use
colSums(sapply(all_manager, "%in%", x = terminated_managers)) > 0
#[1] TRUE FALSE TRUE
Upvotes: 3