Rename a column only if all its values satisfy a condition

Question

I want to rename columns in my data based on whether all values in those columns satisfy a condition. For example, in a numeric column, if all values are greater than 5, rename the column to large_values, otherwise keep the column name as is. Another example, if all values in a character column are free from the & character, rename the column to no_ampersand.

Data

library(tibble)

df <- 
  tribble(~age_1, ~age_2, ~string_1,
        2, 7, "abc",
        3, 8, "efg",
        1, 11, "hi&",
        10, 6, "klmn",
        50, 100, "opq")

Defining two functions for demonstration

Are vector elements greater than a value

is_larger_than_val <- function(x, y) {
  all(x > y)
}

Do vector elements contain &?

does_contain_ampersand <- function(x) {
  all(grepl("&", x))
}

The question

So using those functions, how can I rename headers of df such that when

is_larger_than_val(df$age_2, 5)

[1] TRUE

Will rename age_2 to large_values, but otherwise in case when

is_larger_than_val(df$age_1, 5)

[1] FALSE

will keep age_1 as is?

And similarly, because

does_contain_ampersand(df$string_1)

[1] FALSE

will keep string_1 as is (but if it had been TRUE then string_1 would have been renamed to no_ampersand)?

Desired Output

Given the current data, renaming based on the conditions I specified in is_larger_than_val() and does_contain_ampersand() should return:

# A tibble: 5 x 3
  age_1 large_values string_1
              
1     2            7 abc     
2     3            8 efg     
3     1           11 hi&     
4    10            6 klmn    
5    50          100 opq

I'm sure this could be achieved with nesting some if else statements, but am wondering if there is a simpler way (maybe using tidyverse tricks?).

Thanks!

EDIT

Putting the question in context

Following @Ian's comment below, here's the context for my problem.

I have an R data object that originates from a JSON file. Once I read the JSON file into R, it ends up in the following format:

vec <- c(1, 2, 3)
names(vec) <- c("A", "B", "C")
my_data_object_as_list <- as.list(vec)

my_data_object_as_list
## $A
## [1] 1

## $B
## [1] 2

## $C
## [1] 3

I want to build a function that takes my_data_object_as_list and reorganizes it into a table.

require(tidyr)
require(dplyr)
require(tidyselect)

organize_in_table <- function(as_list_object) {
    as_list_object %>%
    bind_rows() %>%
    pivot_longer(cols = tidyselect::everything())
}

organize_in_table(my_data_object_as_list)

## # A tibble: 3 x 2
##   name  value
##    
## 1 A         1
## 2 B         2
## 3 C         3

Because organize_in_table() is pretty generic, it returns a table with column names (name and value) that are not indicative to what each column is about. To address this problem, I want to add a purpose argument to organize_in_table(), and here is one example:

organize_in_table <-
  function(as_list_object,
           purpose = NULL) {
    table <- as_list_object %>%
      bind_rows() %>%
      pivot_longer(cols = tidyselect::everything())
    
    if (is.null(purpose)) {
      return(table)
    } else if (purpose == "match_letters_and_numbers") {
      table <- rename(table, letters = name, numbers = value)
    }
    return(table)
  }

Now, organize_in_table() can return an object that has meaningful names:

df_letters_and_numbers <- 
  organize_in_table(my_data_object_as_list, "match_letters_and_numbers")

> df_letters_and_numbers
## # A tibble: 3 x 2
##   letters numbers
##        
## 1 A             1
## 2 B             2
## 3 C             3

Here I encounter a problem: How do I know that df_letters_and_numbers[1] has been correctly named as letters whereas df_letters_and_numbers[2] is numbers? On what basis have I verified — when I renamed table within organize_in_table() — that the first column (name) is all letters and the second column (value) is all numbers? I didn't verify.

What if my_data_object_as_list had looked like this:

vec_2 <- c("A", "B", "C")
names(vec_2) <- c(1, 2, 3)
my_data_object_as_list_2 <- as.list(vec_2)

> my_data_object_as_list_2 
## $`1`
## [1] "A"

## $`2`
## [1] "B"

## $`3`
## [1] "C"

Then if I would run organize_in_table(my_data_object_as_list_2, "match_letters_and_numbers") I will get the dataframe with mismatching column names:

## # A tibble: 3 x 2
##   letters numbers
##        
## 1 1       A      
## 2 2       B      
## 3 3       C

Bottom line

So my conclusion is that I have to condition the renaming step within organize_in_table(), based on the content of each column being renamed. To that end, I thought that the first step should be to define a separate function for each test I want to perform (e.g., are all values in the column numbers? are all values letters?). And because I want to make organize_in_table() as scalable as I can, I would like it accept any test that I can build a testing function for.

Rui Barradas · Accepted Answer

Use sapply on the data in order to create a logical index on the columns, but be careful with the character columns in the ampersand case:

i <- sapply(df, is_larger_than_val, y = 5)
names(df)[i] <- "large_values"

i <- sapply(df, does_contain_ampersand)
i <- i | !sapply(df, is.character)
names(df)[!i] <- "no_ampersand"

names(df)
#[1] "age_1"        "large_values" "no_ampersand"