Massimo2013
Massimo2013

Reputation: 593

Access column names in a function to be used in dplyr::mutate

Using OECD data, I can retrieve a database where variables are specified by their IDs and a list with the corresponding labels. Here is a minimal example that reproduces the data structure:

df <- tibble(LOCATION=c("DEU","ITA","USA"),UNIT=c("DEU","EUR","USD"),value=c(120,140,160))
df
## A tibble: 3 x 3
#> LOCATION UNIT  value
#> <chr>    <chr> <dbl>
#> 1 DEU      DEU     120
#> 2 ITA      EUR     140
#> 3 USA      USD     160

df_labels <- list(LOCATION = data.frame(id =c("DEU","ITA","USA"), 
                                        label=c("Germany","Italy","United States")), 
                  UNIT = data.frame(id=c("USD","EUR"),
                                    label=c("Dollar","Euro")))
df_labels
#> $LOCATION
#>    id         label
#> 1 DEU       Germany
#> 2 ITA         Italy
#> 3 USA United States
#> 
#> $UNIT
#>    id  label
#> 1 USD Dollar
#> 2 EUR   Euro

What I want to do is to replace the IDs in variables LOCATION and UNIT in df with the corresponding labels provided in df_labels. I defined the following function:

get_labels <- function(x,db) {
     variable = deparse(substitute(x))
     return(factor(x,levels=db[[variable]]$id,labels=db[[variable]]$label))
     }

so that I can use it in mutate as follows:

df %>% mutate(LOCATION = get_labels(LOCATION,df_labels),
              UNIT = get_labels(UNIT,df_labels))
## A tibble: 3 x 3
#>   LOCATION      UNIT   value
#>   <fct>         <fct>  <dbl>
#> 1 Germany       Euro     120
#> 2 Italy         Euro     140
#> 3 United States Dollar   160

However, I haven't been able to use the function across multiple columns. If I try it using across:

df %>% mutate(across(where(is.character), ~get_labels(.,df_labels)))

the result is an NA in the affected columns. Apparently, the problem is with deparse(substitute(.)), which does not capture the column names. Unfortunately, looking at similar questions such as this one didn't help.

Upvotes: 0

Views: 675

Answers (2)

akrun
akrun

Reputation: 887831

We could do this with deframe and map

library(purrr)
library(dplyr)
library(tibble)
lst1 <- map(df_labels, deframe)
for(nm in names(lst1))  df <- df %>%
           mutate(!! nm := lst1[[nm]][!! rlang::sym(nm)])

-output

df
# A tibble: 3 x 3
#  LOCATION      UNIT   value
#  <chr>         <chr>  <dbl>
#1 Germany       <NA>     120
#2 Italy         Euro     140
#3 United States Dollar   160

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389235

across will not have access to column names so deparse, substitute trick will not work. Try this version :

get_labels <- function(x, y, db) {
  return(factor(x,levels=db[[y]]$id,labels=db[[y]]$label))
}

cols <- sapply(df, is.character)
df[cols] <- purrr::imap_dfc(df[cols], get_labels, db = df_labels)
df
# A tibble: 3 x 3
#  LOCATION      UNIT   value
#  <fct>         <fct>  <dbl>
#1 Germany       NA       120
#2 Italy         Euro     140
#3 United States Dollar   160

Upvotes: 1

Related Questions