Reputation: 51
I'm relatively new to R. I have a table of data consisting of an id, plus 3 values.
library(dplyr)
df <- tibble(id=c(1, 2, 3),val_a = c(13,25,42), val_b = c(25,30,0), val_c = c(7,27,21))
df
# A tibble: 3 × 4
id val_a val_b val_c
<dbl> <dbl> <dbl> <dbl>
1 1 13 25 7
2 2 25 30 27
3 3 42 0 21
I want to append another column with a code that depends on val_a, val_b, and val_c being 20 or greater. I did it this way:
df_1 <- df |>
mutate(val_code = paste0(ifelse(val_a >= 20, "a", ""),
ifelse(val_b >= 20, "b", ""),
ifelse(val_c >= 20, "c", "")
)
)
df_1
# A tibble: 3 × 5
id val_a val_b val_c val_code
<dbl> <dbl> <dbl> <dbl> <chr>
1 1 13 25 7 b
2 2 25 30 27 abc
3 3 42 0 21 ac
My method yielded the desired results (for id = 1, only b>=20, for id = 2, all of a, b, and c are >= 20, and for id = 3, only a and c are >= 20), but it seems like there might be a more elegant way of accomplishing the same task. Any ideas?
Upvotes: 5
Views: 132
Reputation: 33603
Base R:
cols <- paste("val", c("a", "b", "c"), sep = "_")
cols_letters <- sub("val_", "", cols)
df$val_code <- apply(df[cols], 1, \(x) paste(cols_letters[x > 20], collapse = ""))
# # A tibble: 3 × 5
# id val_a val_b val_c val_code
# <dbl> <dbl> <dbl> <dbl> <chr>
# 1 1 13 25 7 b
# 2 2 25 30 27 abc
# 3 3 42 0 21 ac
Upvotes: 0
Reputation: 17656
OP added a comment:
"...For the dplyr approach in the middle, could you use the lookup table I described in a different comment (r lu <- tibble(var_name = c("val_a", "val_b", "val_c"), var_code = c("X","Y","Z")) ) instead of the gsub thing?"
Yes, the lookup table (lu
) can replace gsub
with a slight tweak to the original code:
lu <- tibble(var_name = c("val_a", "val_b", "val_c"), var_code = c("X","Y","Z"))
df %>%
rowwise() %>%
mutate(val_code = paste0(
lu$var_code[c_across(all_of(lu$var_name)) >= 20],
collapse = ""
))
# id val_a val_b val_c val_code
# <dbl> <dbl> <dbl> <dbl> <chr>
# 1 1 13 25 7 Y
# 2 2 25 30 27 XYZ
# 3 3 42 0 21 XZ
df %>%
rowwise() %>%
mutate(val_code = paste0(
lu$var_code[c_across(all_of(lu$var_name)) >= 20],
collapse = ""
))
In base R you can use apply
here to paste the names of the columns meeting the condition together, by row (see a more readable version at the bottom):
df$val_code <- apply(df[grep("val_", names(df))], 1, \(x) {
paste(gsub("val_", "", names(x))[x > 20], collapse = "")
})
For a dplyr
approach, although rowwise
is often discouraged, the only approach I could think of is using rowwise
with c_across
:
want_cols <- grep("val_", names(df), value = TRUE)
df %>%
rowwise() %>%
mutate(val_code = paste0(
gsub("val_", "", want_cols)[c_across(all_of(want_cols)) >= 20],
collapse = ""
))
Both approaches gave the same output:
# id val_a val_b val_c val_code
# <dbl> <dbl> <dbl> <dbl> <chr>
# 1 1 13 25 7 b
# 2 2 25 30 27 abc
# 3 3 42 0 21 ac
A bit more readable base R approach:
want_cols <- grep("val_", names(df))
want_names <- gsub("val_", "", names(df[want_cols]))
df$val <- apply(df[want_cols], 1, \(x){
paste(want_names[x > 20], collapse = "")
})
Upvotes: 2
Reputation: 93938
I think this is a good opportunity to make this a repeatable/modular function:
col_crit <- function(data, labels, FUN) {
sel <- FUN(data)
data[] <- labels[col(data)]
data[!sel] <- ""
do.call(paste0, data)
}
##if you are working with a tibble:
col_crit(as.data.frame(df[-1]), c("a","b","c"), FUN=\(x) x > 20)
##[1] "b" "abc" "ac"
## if you have a data.frame
col_crit(df[-1], c("a","b","c"), FUN=\(x) x > 20)
##[1] "b" "abc" "ac"
You have three inputs...
This code applies the function to the overall dataset to make a logical sel
ection, replaces the dataset with the labels
, blanks the values in the dataset that don't !
match the FUN
ction's criteria, and then collapses across all the columns.
Since it is only ever working at a whole dataset/matrix level and uses vectorised paste
ing of the results together, it should scale relatively well.
Upvotes: 3
Reputation: 102529
Given a lookup table
lu <- tibble(var_name = c("val_a", "val_b", "val_c"), var_code = c("X", "Y", "Z"))
you can try
df %>%
mutate(val_code = Reduce(
str_c,
across(
!id, ~ if_else(
.x >= 20,
with(lu, var_code[match(cur_column(), var_name)]),
""
)
)
))
such that
# A tibble: 3 × 5
id val_a val_b val_c val_code
<dbl> <dbl> <dbl> <dbl> <chr>
1 1 13 25 7 Y
2 2 25 30 27 XYZ
3 3 42 0 21 XZ
You don't have to code the value row by row, but could iteratively accumulate the values column by column.
You can try Reduce
+ across
like below
df %>%
mutate(val_code = Reduce(
str_c,
across(!id, ~ if_else(
.x >= 20,
sub(".*_", "", cur_column()),
""
))
))
which gives
id val_a val_b val_c val_code
<dbl> <dbl> <dbl> <dbl> <chr>
1 1 13 25 7 b
2 2 25 30 27 abc
3 3 42 0 21 ac
Upvotes: 3
Reputation: 66880
Longer but hopefully self-explanatory. Take the data and join to it a version of itself where it is reshaped longer, filtered for >= 20, and summarized to combine for each id
the column names with val_
removed.
library(tidyverse)
left_join(df, df |>
pivot_longer(-id) |>
filter(value >= 20) |>
summarize(val_code = paste0(name |> str_remove("val_"), collapse = ""), .by = id))
Result
Joining with `by = join_by(id)`
# A tibble: 3 × 5
id val_a val_b val_c val_code
<dbl> <dbl> <dbl> <dbl> <chr>
1 1 13 25 7 b
2 2 25 30 27 abc
3 3 42 0 21 ac
Upvotes: 5