Reputation: 173
I am looping a few frequency tables with the freq() command in summarytools and printing the results. In doing so, I noticed that when I am trying to save the freq() object without missing values and convert it to a data frame, the total observations still keeps the missing values.
# Create a vector with 10 observations of "smoker"
smoker <- c("yes", "no", "yes", NA, NA, NA, "yes", "no", "yes", "no")
# Create a DataFrame using the vector
df <- data.frame(smoker)
library(summarytools)
library(dplyr)
# Create a frequency table without missing values
freq(df$smoker, report.nas = FALSE)
# Try to save this table into a data frame
table <- as.data.frame(freq(df$smoker, report.nas = FALSE)) # OR
table <- df %>% freq(smoker, report.nas = FALSE) %>% as.data.frame()
table
The results should look like this (missing values excluded, n=7):
Freq % % Cum.
no 3 42.86 42.86
yes 4 57.14 100.00
Total 7 100.00 100.00
But after saving it to a data.frame, it looks like this (missing values added back on, with total n=10):
Freq % Valid % Valid Cum. % Total % Total Cum.
no 3 42.85714 42.85714 30 30
yes 4 57.14286 100.00000 40 70
<NA> 3 NA NA 30 100
Total 10 100.00000 100.00000 100 100
This seems like a bug but not sure if this is the expected outcome. Any thoughts on how to save this output as a data.frame? I'm hoping to loop the data frame and add kable styling.
Upvotes: 0
Views: 104
Reputation: 17195
Using report.nas
only affects the printing of the NA
values, not the storage of them. If we store the freq
object as see
:
see <- summarytools::freq(df$smoker, report.nas = FALSE)
You can see it prints the values as desired:
# Frequencies
# df$smoker
# Type: Character
#
# Freq % % Cum.
# ----------- ------ -------- --------
# no 3 42.86 42.86
# yes 4 57.14 100.00
# Total 7 100.00 100.00
But it stores them with the NA
values:
So you will still need to subset to get what you want, this approach is simply using !is.na()
on the percent valid column:
want <- as.data.frame(see[!is.na(see[,2]),])
# Freq % Valid % Valid Cum. % Total % Total Cum.
# no 3 42.85714 42.85714 30 30
# yes 4 57.14286 100.00000 40 70
# Total 10 100.00000 100.00000 100 100
Upvotes: 1