Reputation: 4201
I'm trying to use dplyr
's case_when()
to mutate a new column based on conditions in other columns. However, I want the new column to be nesting a vector.
Consider the following toy data. Based on it, I want to summarize the geographical territory of the UK.
library(tibble)
set.seed(1)
my_mat <- matrix(sample(c(TRUE, FALSE), size = 40, replace = TRUE), nrow = 10, ncol = 4)
colnames(my_mat) <- c("England", "Wales", "Scotland", "Northern_Ireland")
my_df <- as_tibble(my_mat)
> my_df
## # A tibble: 10 x 4
## England Wales Scotland Northern_Ireland
## <lgl> <lgl> <lgl> <lgl>
## 1 TRUE TRUE TRUE FALSE
## 2 FALSE TRUE TRUE FALSE
## 3 TRUE TRUE TRUE TRUE
## 4 TRUE TRUE TRUE FALSE
## 5 FALSE TRUE TRUE TRUE
## 6 TRUE FALSE TRUE TRUE
## 7 TRUE FALSE FALSE FALSE
## 8 TRUE FALSE TRUE TRUE
## 9 FALSE FALSE TRUE FALSE
## 10 FALSE TRUE FALSE FALSE
I want to mutate a new collective_geo_territory
column.
England
, Scotland
, Wales
, and Northern_Ireland
are TRUE
, then we say this is United_Kingdom
.England
, Scotland
, and Wales
are TRUE
, then we say this is Great_Britain
TRUE
.So far, I know how to address conditions (1) and (2) detailed above, using the following code
library(dplyr)
my_df %>%
mutate(collective_geo_territory = case_when(England == TRUE & Wales == TRUE & Scotland == TRUE & Northern_Ireland == TRUE ~ "United_Kingdom",
England == TRUE & Wales == TRUE & Scotland == TRUE ~ "Great_Britain"))
However, I want to achieve an output with collective_geo_territory
column that looks like the following:
## # A tibble: 10 x 5
## England Wales Scotland Northern_Ireland collective_geo_territory
## <lgl> <lgl> <lgl> <lgl> <list>
## 1 TRUE TRUE TRUE FALSE <chr [1]> # c("Great_Britain")
## 2 FALSE TRUE TRUE FALSE <chr [2]> # c("Wales", "Scotland")
## 3 TRUE TRUE TRUE TRUE <chr [1]> # c("United_Kingdom")
## 4 TRUE TRUE TRUE FALSE <chr [1]> # c("Great_Britain")
## 5 FALSE TRUE TRUE TRUE <chr [3]> # c("Wales", "Scotland", "Northern_Ireland")
## 6 TRUE FALSE TRUE TRUE <chr [3]> # c("England", "Scotland", "Northern_Ireland")
## 7 TRUE FALSE FALSE FALSE <chr [1]> # c("England")
## 8 TRUE FALSE TRUE TRUE <chr [3]> # c("England", "Scotland", "Northern_Ireland")
## 9 FALSE FALSE TRUE FALSE <chr [1]> # c("Scotland")
## 10 FALSE TRUE FALSE FALSE <chr [1]> # c("Wales")
Upvotes: 0
Views: 660
Reputation: 106
Here's one approach:
library(purrr) # used for pmap
my_df %>%
mutate(collective_geo_territory = case_when(
England & Wales & Scotland & Northern_Ireland ~ list("United_Kingdom"),
England & Wales & Scotland ~ list("Great_Britain"),
TRUE ~ pmap(my_df, ~names(my_df)[c(...)]))
)
Essentially, the last line works as follows:
TRUE
because case_when()
terminates on the first relevant TRUE
. So, we will only reach this line if conditions 1 and 2 have failed.pmap
) and apply the follow function: get the names of the columns in my dataset (names
) and subset them ([]
) only to those where the values are true (contained in c()
)A few additional notes:
"United_Kingdom"
) in a list()
because case_when()
requires consistent types for the resulting vectorEngland == TRUE
(and same for other countries) simply to England
. Since these columns already contain logical values, there's no need to recheck their values, and this makes the code a bit more readable.Upvotes: 3