Reputation: 11
I am new to R and have a rookie question.
I have a raw dataset that contains around 1400 variables with automated variable names like "row1__1" and most variables are dummies with 0 and 1, some contain text. The variable names don't really tell much about what the variable contains and definitely not useful in an output frequency table or whatever.
I am looking for an easy way to add labels to columns so that instead of getting a table like:
0 1
5 10
I am looking for something that gives me an output like:
Is high quality data available?
Not available Available Total
Count 5 10 15
Is there a way to tell R to use a label format each time a particular variable is used?
I have a different matrix in which the variable names and corresponding variable labels are:
Variable name Label
row1__1 Is high quality data available?
... ...
I don't have any useful raw data on observation labels (0=Not available and 1=Available) but I can make a dimension table for that.
I am sorry for asking such as basic question but I can't seem to find any answers online. I am hoping to find a solution that can import labels instead of manually adding labels each time I want to do a frequency table or something.
Kind regards,
Joe
I tried using table()
tab1_d1 | tab1_d2 | tab1_d3 | tab1_cd_1 | tab1_cd_2 | tab1_cd_3 | tab1_cvd2 |
---|---|---|---|---|---|---|
1 | 1 | 0 | 0 | 1 | 0 | 1 |
0 | 1 | 0 | 0 | 1 | 0 | 1 |
1 | 0 | 0 | 1 | 0 | 0 | 0 |
Upvotes: 0
Views: 459
Reputation: 8107
R doesn't provide nice looking tables out of the box, you'd have to use some packages for reader consumption. There are a variety of packages out there. Here's one solution using the gt package:
library(gt)
library(dplyr)
dat <- data.frame(N = 5, Y = 10)
dat |>
mutate(Total = N + Y) |>
rename(`Not available` = N, Available = Y) |>
gt() |>
tab_header("Is high quality data available?")
EDIT: You can repeat this process across all variables and save them in a document using something like the Rmd file below
---
title: "Untitled"
author: "me"
date: "2023-01-23"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
```{r}
library(dplyr)
library(gt)
mydf <- tibble::tribble(
~tab1_d1, ~tab1_d2, ~tab1_d3, ~tab1_cd_1, ~tab1_cd_2, ~tab1_cd_3, ~tab1_cvd2,
1, 1, 0, 0, 1, 0, 1,
0, 1, 0, 0, 1, 0, 1,
1, 0, 0, 1, 0, 0, 0
)
print_table <- function(x) {
mydf |>
mutate(cat = if_else(!! sym(x) == 1, "Available", "Not available") |>
forcats::fct_expand("Available", "Not available")) |>
count(cat, .drop = FALSE) |>
tidyr::pivot_wider(names_from = cat, values_from = n) |>
mutate(Total = Available + `Not available`) |>
gt() |>
tab_header(glue::glue("{x} - Is high quality data available?"))
}
mydf |>
names() |>
purrr::map(print_table) |>
htmltools::tagList()
```
Upvotes: 1