Reputation: 600
I have this data frame,
id <- seq(1:4)
var_1 <- c(TRUE, FALSE, FALSE, FALSE)
var_2 <- c(FALSE, TRUE, FALSE, FALSE)
var_3 <- c(FALSE, TRUE, FALSE, FALSE)
var_4 <- c(FALSE, FALSE, TRUE, FALSE)
var_5 <- c(FALSE, TRUE, TRUE, FALSE)
df <- data.frame(id, var_1, var_2, var_3, var_4, var_5)
df
> df
id var_1 var_2 var_3 var_4 var_5
1 1 TRUE FALSE FALSE FALSE FALSE
2 2 FALSE TRUE TRUE FALSE TRUE
3 3 FALSE FALSE FALSE TRUE TRUE
4 4 FALSE FALSE FALSE FALSE FALSE
Now, I want to look on every row and see if any of the columns have TRUE
values. If yes, I want to record it in a new column result
. For the first row, I have TRUE
in only var_1
. So I record, the variable name where TRUE
found, i.e. var_1
. For second row, I have TRUE
in var_2
, var_3
and var_5
. The entry in the result column should be var_2, var_3, var_5
. Lastly, row 4 has no TRUE
, so it should be NA
. The resulting data.frame should look like this:
id var_1 var_2 var_3 var_4 var_5 result
1 1 TRUE FALSE FALSE FALSE FALSE var_1
2 2 FALSE TRUE TRUE FALSE TRUE var_2, var_3, var_5
3 3 FALSE FALSE FALSE TRUE TRUE var_4, var_5
4 4 FALSE FALSE FALSE FALSE FALSE <NA>
I can do this,
library(tidyverse)
df |>
mutate(across(where(is.logical),
~ case_when(.x ~ cur_column()), .names = "{.col}")) |>
unite("result", var_1:var_5, sep = ", ", na.rm = TRUE)
id result
1 1 var_1
2 2 var_2, var_3, var_5
3 3 var_4, var_5
4 4
However, the problem is the number of variables should not be fixed to 5. It should be scalable to any variable number. Any idea how to achive that?
Upvotes: 5
Views: 128
Reputation: 270045
1) dplyr Define a function toNameString
that takes a one-row data frame (or named vector) of logical columns (or components) and returns a comma and space separated character string of the column names corresponding to TRUE. Then apply it to the non-id columns in each row.
(If you need to apply this to a data frame that has other columns as well then change pick(everything())
appropriately. For example, pick(where(is.logical))
would pick out only the logical columns or pick(starts_with("var_"))
would pick out only the columns whose name starts with var_
.)
library(dplyr)
toNameString <- function(x) toString(names(x)[unlist(x)])
df |> mutate(result = toNameString(pick(everything())), .by = id)
giving
id var_1 var_2 var_3 var_4 var_5 result
1 1 TRUE FALSE FALSE FALSE FALSE var_1
2 2 FALSE TRUE TRUE FALSE TRUE var_2, var_3, var_5
3 3 FALSE FALSE FALSE TRUE TRUE var_4, var_5
4 4 FALSE FALSE FALSE FALSE FALSE
2) Base R Using only base R and toNameString
from above apply it to each row except for column 1.
df |>
list(x = _) |>
with(transform(x, result = apply(x[-1], 1, toNameString)))
Input
df <- data.frame(
id = 1:4,
var_1 = c(TRUE, FALSE, FALSE, FALSE),
var_2 = c(FALSE, TRUE, FALSE, FALSE),
var_3 = c(FALSE, TRUE, FALSE, FALSE),
var_4 = c(FALSE, FALSE, TRUE, FALSE),
var_5 = c(FALSE, TRUE, TRUE, FALSE)
)
Upvotes: 1
Reputation: 40171
A purrr
solution inspired by @jpsmith:
df %>%
mutate(result = pmap_chr(across(var_1:var_5),
~ if(any(c(...))) {toString(names(c(...))[c(...)])}
else {NA_character_}))
id var_1 var_2 var_3 var_4 var_5 result
1 1 TRUE FALSE FALSE FALSE FALSE var_1
2 2 FALSE TRUE TRUE FALSE TRUE var_2, var_3, var_5
3 3 FALSE FALSE FALSE TRUE TRUE var_4, var_5
4 4 FALSE FALSE FALSE FALSE FALSE <NA>
Upvotes: 1
Reputation: 102529
nms <- names(df)[-1]
df$result <- by(unlist(df[-1]), row(df[-1]), \(v) ifelse(all(!v), NA, toString(nms[v])))
which gives
id var_1 var_2 var_3 var_4 var_5 result
1 1 TRUE FALSE FALSE FALSE FALSE var_1
2 2 FALSE TRUE TRUE FALSE TRUE var_2, var_3, var_5
3 3 FALSE FALSE FALSE TRUE TRUE var_4, var_5
4 4 FALSE FALSE FALSE FALSE FALSE <NA>
df %>%
left_join(
df %>%
pivot_longer(
cols = !id,
names_to = "var",
values_to = "value"
) %>%
reframe(result = toString(var[value]), .by = id),
by = "id"
)
which goes to
id var_1 var_2 var_3 var_4 var_5 result
1 1 TRUE FALSE FALSE FALSE FALSE var_1
2 2 FALSE TRUE TRUE FALSE TRUE var_2, var_3, var_5
3 3 FALSE FALSE FALSE TRUE TRUE var_4, var_5
4 4 FALSE FALSE FALSE FALSE FALSE
Upvotes: 2
Reputation: 17656
In base R you can try:
df$new_col <- apply(df[ , -1], 1, \(x) {
if (any(x)) {paste(names(df)[-1][x], collapse = ", ")}
else {NA}
})
Output:
# id var_1 var_2 var_3 var_4 var_5 new_col
# 1 1 TRUE FALSE FALSE FALSE FALSE var_1
# 2 2 FALSE TRUE TRUE FALSE TRUE var_2, var_3, var_5
# 3 3 FALSE FALSE FALSE TRUE TRUE var_4, var_5
# 4 4 FALSE FALSE FALSE FALSE FALSE <NA>
Here I did df[,-1]
to remove the id
column, but your data have multiple columns that are a mix between what you want to use and what you want to exclude, you could tweak to below. Here, grep
identifies all columns with var
in the name (wantcols
), but you could define it however you want:
# define columns of interest
wantcols <- grep("var", names(df), value = TRUE)
df$new_col <- apply(df[wantcols], 1, \(x) {
if (any(x)) {paste(names(df[wantcols])[x], collapse = ", ")}
else {NA}
})
Upvotes: 1
Reputation: 24845
You can do the following:
left_join(
df,
pivot_longer(df,cols=starts_with("var_")) |>
filter(value==T) |>
reframe(result = toString(name),.by=id)
)
Output:
id var_1 var_2 var_3 var_4 var_5 result
1 1 TRUE FALSE FALSE FALSE FALSE var_1
2 2 FALSE TRUE TRUE FALSE TRUE var_2, var_3, var_5
3 3 FALSE FALSE FALSE TRUE TRUE var_4, var_5
4 4 FALSE FALSE FALSE FALSE FALSE <NA>
You can also do this without pivoting longer.. For example:
df$result <- apply(df[,-1], 1, \(x) toString(names(x)[x]))
Output:
id var_1 var_2 var_3 var_4 var_5 result
1 1 TRUE FALSE FALSE FALSE FALSE var_1
2 2 FALSE TRUE TRUE FALSE TRUE var_2, var_3, var_5
3 3 FALSE FALSE FALSE TRUE TRUE var_4, var_5
4 4 FALSE FALSE FALSE FALSE FALSE
Upvotes: 3