Reputation: 1040
I have read in a CSV survey with over 500 variables (as characters). I used library(labelled) to assign variable labels and value labels for most of these variables. I then pass the result into tbl_summary.
See here: https://raw.githubusercontent.com/larmarange/labelled/master/cheatsheet/labelled_cheatsheet.pdf
In order to assign variable labels and values I first labelled all the variables by passing in a list for each variable. Next, I added value labels for most variables using "set_value_labels". In order to add value labels for the variables, I had to make sure that every single variable was a character.
The problem: Once I have labelled all the variables and values of interest, I cannot seem to convert to numeric without dropping the label.
Here is a mild example of what I am trying.
Read in the CSV file as strings:
mtcars2 <- mtcars %>% mutate_if(is.numeric,as.character)
Assign labels to some variables:
var_label(mtcars2) <- list(mpg = "Miles Per Gallon", cyl = "Cylinder", disp = "Displacement")
Assign value labels to some variables:In order for this to work, ALL the variable need to be character (hence why I read in the CSV as strings). If not, I get the error:
Error: Can't convert `labels` <character> to match type of `x` <double>.
mtcars3 <- mtcars2 %>% set_value_labels(cyl=c(`4`="four",`6`=six,`8`="eight")) %>% haven::as_factor(.)
have::factors(.) is supposed to retain the labels, but it does not (perhaps because I labelled using a different package? Perhaps because these are not STATA labels, but manually coded labels?)
mtcars4 <- mtcars3 %>% mutate(mpg = as.numeric(mpg))
Run tbl_summary on mtcars3 (all strings), this keeps all the labels, and their values (great!), but it does not provide the mean values for variables that I perceive as numeric.
tbl_summary(mtcars3)
I convert one variable to numeric so that I can see the mean value, rather than categorical. But now, the label drops away from every variable that I converted to numeric.
tbl_summary(mtcars4)
How do I convert some variables of mtcars3 into numeric without dropping the label attributes?
The class of the actual data before I apply the haven::as.factor(.)
> class(survey_clean2$Q51_5)
[1] "haven_labelled" "vctrs_vctr" "character"
Class of data after I apply haven::factor(.)
class(survey_clean2$Q51_5)
[1] "factor"
Here is an a better example with some data
data_in <- read_table2("Q50_2 Q50_3 Q85 Q56
1 <NA> <NA> <NA>
2 <NA> <NA> <NA>
3 <NA> <NA> <NA>
<NA> Rarely Sometimes 12
5 <NA> <NA> <NA>
6 <NA> <NA> <NA>
7 <NA> <NA> <NA>
8 <NA> <NA> <NA>
9 <NA> <NA> <NA>
10 Often Sometimes 65")
We have to convert it all to character because that is the actual format of my processed data.
data_in <- data_nonlab %>% mutate_all(as.character)
str(data_nonlab)
This does not work for numeric vars.
tbl_summary(data_in)
So I convert it to numeric.
data_in_num <-
data_nonlab %>% mutate(Q50_2 = labelled::labelled(as.numeric(Q50_2), label = attr(Q50_2, "label")))
str(data_in_num)
But that var is omitted from the tbl_summary output, because:
tbl_summary(data_in_num)
Column(s) ‘Q50_2’ omitted from output.
Accepted classes are ‘character’, ‘factor’, ‘numeric’, ‘logical’, ‘integer’, or ‘difftime’.
Upvotes: 1
Views: 2173
Reputation: 11595
There is a fantastic function from the labelled package that helps in these scenarios: copy_labels_from()
https://larmarange.github.io/labelled/reference/copy_labels.html
You can use it to re-apply labels after they've been stripped by an operation. See example below.
library(gtsummary)
library(tidyverse)
# any operation that results in a loss of labels
trial2 <-
trial %>%
select(age, marker) %>%
purrr::map_dfc(as.numeric)
trial2 %>%
# copy the labels from the original data frame
labelled::copy_labels_from(trial) %>%
tbl_summary()
Here is a base R solution:
library(tidyverse)
library(gtsummary)
mtcars %>%
mutate(
cyl = factor(cyl, levels = c(4, 6, 8), labels = c("four", "six", "eight"))
) %>%
labelled::set_variable_labels(mpg = "Miles Per Gallon",
cyl = "Cylinder",
disp = "Displacement") %>%
select(mpg, cyl, disp) %>%
tbl_summary()
Upvotes: 3