NewBee
NewBee

Reputation: 1040

Preparing data for tbl_summary: convert to numeric without losing label attribute

I have read in a CSV survey with over 500 variables (as characters). I used library(labelled) to assign variable labels and value labels for most of these variables. I then pass the result into tbl_summary.

See here: https://raw.githubusercontent.com/larmarange/labelled/master/cheatsheet/labelled_cheatsheet.pdf

In order to assign variable labels and values I first labelled all the variables by passing in a list for each variable. Next, I added value labels for most variables using "set_value_labels". In order to add value labels for the variables, I had to make sure that every single variable was a character.

The problem: Once I have labelled all the variables and values of interest, I cannot seem to convert to numeric without dropping the label.

Here is a mild example of what I am trying.

Read in the CSV file as strings:

mtcars2 <- mtcars %>% mutate_if(is.numeric,as.character)

Assign labels to some variables:

var_label(mtcars2) <- list(mpg = "Miles Per Gallon", cyl = "Cylinder", disp = "Displacement")

Assign value labels to some variables:In order for this to work, ALL the variable need to be character (hence why I read in the CSV as strings). If not, I get the error: Error: Can't convert `labels` <character> to match type of `x` <double>.

mtcars3 <- mtcars2 %>% set_value_labels(cyl=c(`4`="four",`6`=six,`8`="eight")) %>% haven::as_factor(.)

have::factors(.) is supposed to retain the labels, but it does not (perhaps because I labelled using a different package? Perhaps because these are not STATA labels, but manually coded labels?)

mtcars4 <- mtcars3 %>% mutate(mpg = as.numeric(mpg))

Run tbl_summary on mtcars3 (all strings), this keeps all the labels, and their values (great!), but it does not provide the mean values for variables that I perceive as numeric.

 tbl_summary(mtcars3)
    

I convert one variable to numeric so that I can see the mean value, rather than categorical. But now, the label drops away from every variable that I converted to numeric.

tbl_summary(mtcars4)

How do I convert some variables of mtcars3 into numeric without dropping the label attributes?

The class of the actual data before I apply the haven::as.factor(.)

> class(survey_clean2$Q51_5)
[1] "haven_labelled" "vctrs_vctr"     "character"

Class of data after I apply haven::factor(.)

class(survey_clean2$Q51_5)
[1] "factor"

Here is an a better example with some data

data_in <- read_table2("Q50_2     Q50_3  Q85 Q56
1    <NA>      <NA> <NA>
2    <NA>      <NA> <NA>
3    <NA>      <NA> <NA>
<NA>  Rarely Sometimes   12
5    <NA>      <NA> <NA>
6    <NA>      <NA> <NA>
7    <NA>      <NA> <NA>
8    <NA>      <NA> <NA>
9    <NA>      <NA> <NA>
10  Often Sometimes  65")

We have to convert it all to character because that is the actual format of my processed data.

data_in <- data_nonlab %>% mutate_all(as.character)
str(data_nonlab)

This does not work for numeric vars.

tbl_summary(data_in)

So I convert it to numeric.

data_in_num <- 
data_nonlab %>% mutate(Q50_2 = labelled::labelled(as.numeric(Q50_2), label = attr(Q50_2, "label")))

str(data_in_num)

But that var is omitted from the tbl_summary output, because:

tbl_summary(data_in_num)
Column(s) ‘Q50_2’ omitted from output.
Accepted classes are ‘character’, ‘factor’, ‘numeric’, ‘logical’, ‘integer’, or ‘difftime’.

Upvotes: 1

Views: 2173

Answers (1)

Daniel D. Sjoberg
Daniel D. Sjoberg

Reputation: 11595

There is a fantastic function from the labelled package that helps in these scenarios: copy_labels_from() https://larmarange.github.io/labelled/reference/copy_labels.html

You can use it to re-apply labels after they've been stripped by an operation. See example below.

library(gtsummary)
library(tidyverse)

# any operation that results in a loss of labels
trial2 <-
  trial %>%
  select(age, marker) %>%
  purrr::map_dfc(as.numeric)

trial2 %>%
  # copy the labels from the original data frame
  labelled::copy_labels_from(trial) %>%
  tbl_summary()

enter image description here

Here is a base R solution:

library(tidyverse)
library(gtsummary)

mtcars %>% 
  mutate(
    cyl = factor(cyl, levels = c(4, 6, 8), labels = c("four", "six", "eight"))
  ) %>%
  labelled::set_variable_labels(mpg = "Miles Per Gallon", 
                                cyl = "Cylinder", 
                                disp = "Displacement") %>%
  select(mpg, cyl, disp) %>%
  tbl_summary()

enter image description here

Upvotes: 3

Related Questions