PinkyL
PinkyL

Reputation: 351

Extract data labels in R from attr and add as column to correspond to the variable/column name

I have a very large data set with variable names that are super abbreviated and it would help immensely if the label in the attr(*, "label") section was extracted and showed up in the column beside the corresponding variable.

label(mtcars[["mpg"]]) <- "Miles/(US) gallon"
label(mtcars[["hp"]]) <- "Gross horsepower"
label(mtcars[["wt"]]) <- "Weight (1000lbs)"

My current code just gets the mean/sd from the entire data set:

mtcars  %>% select(mpg, hp, wt) %>% pivot_longer(everything()) %>% group_by(name) %>% summarise(mean=mean(value, na.rm = TRUE), sd=sd(value, na.rm=TRUE)) 

But I want a column with the label of the variables so it's easier to tell:

name  mean   sd    label
hp    14.7.  68.6  Gross horsepower
mpg   20.1   6.03  Miles/(US) gallon
wt    3.22   0.978 Weight (1000lbs)

I found a thread that sort of gets to what I want, but if I add mutate(labels=label(mtcars)[name]) at the end of the code, I get a column with NA instead of the labels.

Upvotes: 1

Views: 642

Answers (1)

akrun
akrun

Reputation: 887571

We can use imap

library(purrr)
library(dplyr)
library(Hmisc)
imap_dfr(mtcars[c('hp', 'mpg', 'wt')], ~ 
      tibble(name = .y, mean = mean(.x[[1]]), 
             sd = sd(.x[[1]], na.rm = TRUE), 
             label = attr(.x, 'label')))

If we use the OP's method, we can also use summarise_all and then do the pivot_longer

library(tidyr)
mtcars %>%
    dplyr::select(mpg, hp, wt) %>% 
    summarise_all(list(mean =  ~mean(., na.rm = TRUE),
                       sd = ~sd(., na.rm = TRUE), 
                       label = ~attr(., 'label'))) %>%
    mutate(rn = 1) %>%
   pivot_longer(cols = -rn, names_to = c('name', '.value'), names_sep="_") %>% 
   select(-rn)
#  name      mean         sd             label
#1  mpg  20.09062  6.0269481 Miles/(US) gallon
#2   hp 146.68750 68.5628685  Gross horsepower
#3   wt   3.21725  0.9784574  Weight (1000lbs)

Upvotes: 2

Related Questions