Reputation: 239
I have recently come across the package called skimr
which helps create useful summary statistics. I have written the following codes to extract summary stats only on numerical columns. My first question is, is there a more direct way that skimr permits to specify the type of variables for which I want summary stats? My second question is, what does append == TRUE
actually achieve when I write the my_skim
"closure"?
library(skimr)
library(dplyr)
### Creating an example dataset
test.df1 <- data.frame("Year" = sample(2018:2020, 20, replace = TRUE),
"Firm" = head(LETTERS, 5),
"Exporter"= sample(c("Yes", "No"), 20, replace = TRUE),
"Revenue" = sample(100:200, 20, replace = TRUE),
stringsAsFactors = FALSE)
test.df1 <- rbind(test.df1,
data.frame("Year" = c(2018, 2018),
"Firm" = c("Y", "Z"),
"Exporter" = c("Yes", "No"),
"Revenue" = c(NA, NA)))
test.df1 <- test.df1 %>% mutate(Profit = Revenue - sample(20:30, 22, replace = TRUE ))
### Using skimr package to extract summary stats
my_skim <- skim_with(numeric = sfl(minimum = min, maximum = max, hist = NULL), append = TRUE)
test.df1_skim1 <- test.df1 %>%
group_by(Year) %>%
my_skim() %>%
filter (skim_type != "character") %>%
select(-starts_with("character"))
Upvotes: 0
Views: 835
Reputation: 6755
If you only want summary of the numeric variables you could set all the other types to NULL or else you could run the skim and use yank()
to get subtable for a type.
From https://docs.ropensci.org/skimr/articles/skimr.html#reshaping-the-results-from-skim-
skim(Orange) %>% yank("numeric")
The append option lets you either replace the default statistics or append to the defaults.
Upvotes: 3