vasili111
vasili111

Reputation: 6940

How to get descriptive table for both continuous and categorical variables?

I want to get descriptive table in html format for all variables that are in data frame. I need for continuous variables mean and standard deviation. For categorical variables frequency (absolute count) of each category and percentage of each category. Also I need the count of missing values to be included.

Lets use this data:

data("ToothGrowth")
df<-ToothGrowth
df$len[2]<-NA
df$supp[5]<-NA

I want to get table in html format that will look like this:

----------------------------------------------------------------------
Variables       N (missing)     Mean (SD)  / %
----------------------------------------------------------------------
len               59 (1)             18.9 (7.65)
supp
   OJ            30                   50%
   VC            29                   48.33%
   NA            1                    1.67%
dose            60                   1.17 (0.629)

I need also to set the number of digits after decimal point to show.

If you know better variant to display that information in html in better way than please provide your solution.

Upvotes: 0

Views: 1604

Answers (2)

vasili111
vasili111

Reputation: 6940

I found r package table1 that does what I want. Here is a code:

library(table1)
data("ToothGrowth")
df<-ToothGrowth
df$len[2]<-NA
df$supp[5]<-NA
table1(reformulate(colnames(df)), data=df)

Upvotes: 0

Gregory
Gregory

Reputation: 4279

Here's a programatic way to create separate summary tables for the numeric and factor columns. Note that this doesn't make note of NAs in the table as you requested, but does ignore NAs to calculate summary stats as you did. It's a starting point, anyway. From here you could combine the tables and format the headers however you want.

If you knit this code within an RMarkdown document with HTML output, kable will automatically generate the html table and a css will format the table nicely with a horizontal rules as pictured below. Note that there's also a booktabs option to kable that makes prettier tables like the LaTeX booktabs package. Otherwise, see the documentation for knitr::kable for options.

library(dplyr)
library(tidyr)
library(knitr)

data("ToothGrowth")
df<-ToothGrowth
df$len[2]<-NA
df$supp[5]<-NA

numeric_cols <- dplyr::select_if(df, is.numeric) %>%
  gather(key = "variable", value = "value") %>%
  group_by(variable) %>%
  summarize(count = n(),
            mean = mean(value, na.rm = TRUE),
            sd = sd(value, na.rm = TRUE))

factor_cols <- dplyr::select_if(df, is.factor) %>%
  gather(key = "variable", value = "value") %>%
  group_by(variable, value) %>%
  summarize(count = n()) %>%
  mutate(p = count / sum(count, na.rm = TRUE))

knitr::kable(numeric_cols)

enter image description here

knitr::kable(factor_cols)

enter image description here

Upvotes: 1

Related Questions