Reputation: 51

Add a column in R based on other columns values

I'm relatively new to R. I have a table of data consisting of an id, plus 3 values.

library(dplyr)

df <- tibble(id=c(1, 2, 3),val_a = c(13,25,42), val_b = c(25,30,0), val_c = c(7,27,21))
df
# A tibble: 3 × 4
     id val_a val_b val_c
  <dbl> <dbl> <dbl> <dbl>
1     1    13    25     7
2     2    25    30    27
3     3    42     0    21

I want to append another column with a code that depends on val_a, val_b, and val_c being 20 or greater. I did it this way:

df_1 <- df |> 
  mutate(val_code = paste0(ifelse(val_a >= 20, "a", ""), 
                           ifelse(val_b >= 20, "b", ""), 
                           ifelse(val_c >= 20, "c", "")
  )
  )

df_1
# A tibble: 3 × 5
     id val_a val_b val_c val_code
  <dbl> <dbl> <dbl> <dbl> <chr>   
1     1    13    25     7 b       
2     2    25    30    27 abc     
3     3    42     0    21 ac

My method yielded the desired results (for id = 1, only b>=20, for id = 2, all of a, b, and c are >= 20, and for id = 3, only a and c are >= 20), but it seems like there might be a more elegant way of accomplishing the same task. Any ideas?

Upvotes: 5

Answers (5)

s_baldur

Reputation: 33603

Base R:

cols <- paste("val", c("a", "b", "c"), sep = "_")
cols_letters <- sub("val_", "", cols)

df$val_code <- apply(df[cols], 1, \(x) paste(cols_letters[x > 20], collapse = ""))

# # A tibble: 3 × 5
#      id val_a val_b val_c val_code
#   <dbl> <dbl> <dbl> <dbl> <chr>   
# 1     1    13    25     7 b       
# 2     2    25    30    27 abc     
# 3     3    42     0    21 ac

Upvotes: 0

jpsmith

Reputation: 17656

Edit based on comment

OP added a comment:

"...For the dplyr approach in the middle, could you use the lookup table I described in a different comment (r lu <- tibble(var_name = c("val_a", "val_b", "val_c"), var_code = c("X","Y","Z")) ) instead of the gsub thing?"

Yes, the lookup table (lu) can replace gsub with a slight tweak to the original code:

lu <- tibble(var_name = c("val_a", "val_b", "val_c"), var_code = c("X","Y","Z"))

df %>%
  rowwise() %>%
  mutate(val_code = paste0(
    lu$var_code[c_across(all_of(lu$var_name)) >= 20],
    collapse = ""
  ))


# id val_a val_b val_c val_code
# <dbl> <dbl> <dbl> <dbl> <chr>   
#   1     1    13    25     7 Y       
#   2     2    25    30    27 XYZ     
#   3     3    42     0    21 XZ

Original answer

df %>%
  rowwise() %>%
  mutate(val_code = paste0(
    lu$var_code[c_across(all_of(lu$var_name)) >= 20],
    collapse = ""
  ))

In base R you can use apply here to paste the names of the columns meeting the condition together, by row (see a more readable version at the bottom):

df$val_code <- apply(df[grep("val_", names(df))], 1, \(x) {
  paste(gsub("val_", "", names(x))[x > 20], collapse = "")
})

For a dplyr approach, although rowwise is often discouraged, the only approach I could think of is using rowwise with c_across:

want_cols <- grep("val_", names(df), value = TRUE)

df %>%
  rowwise() %>%
  mutate(val_code = paste0(
    gsub("val_", "", want_cols)[c_across(all_of(want_cols)) >= 20],
    collapse = ""
  ))

Both approaches gave the same output:

#     id val_a val_b val_c val_code
#   <dbl> <dbl> <dbl> <dbl> <chr>   
# 1     1    13    25     7 b       
# 2     2    25    30    27 abc     
# 3     3    42     0    21 ac

A bit more readable base R approach:

want_cols <- grep("val_", names(df))
want_names <- gsub("val_", "", names(df[want_cols]))

df$val <- apply(df[want_cols], 1, \(x){
  paste(want_names[x > 20], collapse = "")
})

Upvotes: 2

thelatemail

Reputation: 93938

I think this is a good opportunity to make this a repeatable/modular function:

col_crit <- function(data, labels, FUN) {
    sel <- FUN(data)
    data[] <- labels[col(data)]
    data[!sel] <- ""
    do.call(paste0, data)
}

##if you are working with a tibble:
col_crit(as.data.frame(df[-1]), c("a","b","c"), FUN=\(x) x > 20)
##[1] "b"   "abc" "ac" 

## if you have a data.frame
col_crit(df[-1], c("a","b","c"), FUN=\(x) x > 20)
##[1] "b"   "abc" "ac"

You have three inputs...

a dataset with N columns
(single?) character labels for N columns in the output
a function applied to select labels for the matching columns

This code applies the function to the overall dataset to make a logical selection, replaces the dataset with the labels, blanks the values in the dataset that don't ! match the FUNction's criteria, and then collapses across all the columns.

Since it is only ever working at a whole dataset/matrix level and uses vectorised pasteing of the results together, it should scale relatively well.

Upvotes: 3

ThomasIsCoding

Reputation: 102529

Update

Given a lookup table

lu <- tibble(var_name = c("val_a", "val_b", "val_c"), var_code = c("X", "Y", "Z"))

you can try

df %>%
    mutate(val_code = Reduce(
        str_c,
        across(
            !id, ~ if_else(
                .x >= 20,
                with(lu, var_code[match(cur_column(), var_name)]),
                ""
            )
        )
    ))

such that

# A tibble: 3 × 5
     id val_a val_b val_c val_code
  <dbl> <dbl> <dbl> <dbl> <chr>
1     1    13    25     7 Y
2     2    25    30    27 XYZ
3     3    42     0    21 XZ

Older

You don't have to code the value row by row, but could iteratively accumulate the values column by column.

You can try Reduce + across like below

df %>%
    mutate(val_code = Reduce(
        str_c,
        across(!id, ~ if_else(
            .x >= 20,
            sub(".*_", "", cur_column()),
            ""
        ))
    ))

which gives

     id val_a val_b val_c val_code
  <dbl> <dbl> <dbl> <dbl> <chr>
1     1    13    25     7 b
2     2    25    30    27 abc     
3     3    42     0    21 ac

Upvotes: 3

Jon Spring

Reputation: 66880

Longer but hopefully self-explanatory. Take the data and join to it a version of itself where it is reshaped longer, filtered for >= 20, and summarized to combine for each id the column names with val_ removed.

library(tidyverse)
left_join(df, df |>
  pivot_longer(-id) |>
  filter(value >= 20) |>
  summarize(val_code = paste0(name |> str_remove("val_"), collapse = ""), .by = id))

Result

Joining with `by = join_by(id)`
# A tibble: 3 × 5
     id val_a val_b val_c val_code
  <dbl> <dbl> <dbl> <dbl> <chr>   
1     1    13    25     7 b       
2     2    25    30    27 abc     
3     3    42     0    21 ac