jacerl980
jacerl980

Reputation: 11

adding multiple labels (for columns and for observations)

I am new to R and have a rookie question.

I have a raw dataset that contains around 1400 variables with automated variable names like "row1__1" and most variables are dummies with 0 and 1, some contain text. The variable names don't really tell much about what the variable contains and definitely not useful in an output frequency table or whatever.

I am looking for an easy way to add labels to columns so that instead of getting a table like:

   0   1
   5   10

I am looking for something that gives me an output like:

        Is high quality data available?
        Not available    Available     Total
Count   5                10            15  

Is there a way to tell R to use a label format each time a particular variable is used?

I have a different matrix in which the variable names and corresponding variable labels are:

Variable name   Label                          
row1__1         Is high quality data available?   
...             ...                               

I don't have any useful raw data on observation labels (0=Not available and 1=Available) but I can make a dimension table for that.

I am sorry for asking such as basic question but I can't seem to find any answers online. I am hoping to find a solution that can import labels instead of manually adding labels each time I want to do a frequency table or something.

Kind regards,

Joe

I tried using table()

tab1_d1 tab1_d2 tab1_d3 tab1_cd_1 tab1_cd_2 tab1_cd_3 tab1_cvd2
1 1 0 0 1 0 1
0 1 0 0 1 0 1
1 0 0 1 0 0 0

Upvotes: 0

Views: 459

Answers (1)

Phil
Phil

Reputation: 8107

R doesn't provide nice looking tables out of the box, you'd have to use some packages for reader consumption. There are a variety of packages out there. Here's one solution using the gt package:

library(gt)
library(dplyr)

dat <- data.frame(N = 5, Y = 10)

dat |> 
  mutate(Total = N + Y) |> 
  rename(`Not available` = N, Available = Y) |> 
  gt() |> 
  tab_header("Is high quality data available?")

enter image description here

EDIT: You can repeat this process across all variables and save them in a document using something like the Rmd file below

---
title: "Untitled"
author: "me"
date: "2023-01-23"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```

```{r}
library(dplyr)
library(gt)

mydf <- tibble::tribble(
  ~tab1_d1,     ~tab1_d2,   ~tab1_d3,   ~tab1_cd_1,     ~tab1_cd_2,     ~tab1_cd_3,     ~tab1_cvd2,
  1,    1,  0,  0,  1,  0,  1,
  0,    1,  0,  0,  1,  0,  1,
  1,    0,  0,  1,  0,  0,  0
)

print_table <- function(x) {
       mydf |> 
         mutate(cat = if_else(!! sym(x) == 1, "Available", "Not available") |> 
                  forcats::fct_expand("Available", "Not available")) |> 
         count(cat, .drop = FALSE) |> 
         tidyr::pivot_wider(names_from = cat, values_from = n) |> 
         mutate(Total = Available + `Not available`) |> 
         gt() |> 
         tab_header(glue::glue("{x} - Is high quality data available?"))
}

mydf |> 
  names() |> 
  purrr::map(print_table) |> 
  htmltools::tagList()
```

Upvotes: 1

Related Questions