Kristine
Kristine

Reputation: 173

Saving freq() object into data frame

I am looping a few frequency tables with the freq() command in summarytools and printing the results. In doing so, I noticed that when I am trying to save the freq() object without missing values and convert it to a data frame, the total observations still keeps the missing values.

# Create a vector with 10 observations of "smoker"
smoker <- c("yes", "no", "yes", NA, NA, NA, "yes", "no", "yes", "no")

# Create a DataFrame using the vector
df <- data.frame(smoker)

library(summarytools)
library(dplyr)

# Create a frequency table without missing values
freq(df$smoker, report.nas = FALSE)

# Try to save this table into a data frame
table <- as.data.frame(freq(df$smoker, report.nas = FALSE))  # OR
  table <- df %>% freq(smoker, report.nas = FALSE) %>% as.data.frame()
table

The results should look like this (missing values excluded, n=7):

          Freq        %   % Cum.
     no      3    42.86    42.86
    yes      4    57.14   100.00
  Total      7   100.00   100.00

But after saving it to a data.frame, it looks like this (missing values added back on, with total n=10):

      Freq   % Valid % Valid Cum. % Total % Total Cum.
no       3  42.85714     42.85714      30           30
yes      4  57.14286    100.00000      40           70
<NA>     3        NA           NA      30          100
Total   10 100.00000    100.00000     100          100

This seems like a bug but not sure if this is the expected outcome. Any thoughts on how to save this output as a data.frame? I'm hoping to loop the data frame and add kable styling.

Upvotes: 0

Views: 104

Answers (1)

jpsmith
jpsmith

Reputation: 17195

Using report.nas only affects the printing of the NA values, not the storage of them. If we store the freq object as see:

see <- summarytools::freq(df$smoker, report.nas = FALSE)

You can see it prints the values as desired:

# Frequencies  
# df$smoker  
# Type: Character  
# 
#        Freq        %   % Cum.
# ----------- ------ -------- --------
#          no      3    42.86    42.86
#         yes      4    57.14   100.00
#       Total      7   100.00   100.00

But it stores them with the NA values:

enter image description here

So you will still need to subset to get what you want, this approach is simply using !is.na() on the percent valid column:

want <- as.data.frame(see[!is.na(see[,2]),])

#       Freq   % Valid % Valid Cum. % Total % Total Cum.
# no       3  42.85714     42.85714      30           30
# yes      4  57.14286    100.00000      40           70
# Total   10 100.00000    100.00000     100          100

enter image description here

Upvotes: 1

Related Questions