Reputation: 5925
I am using the R programming language. I created the following data set for this example:
var_1 <- rnorm(1000,10,10)
var_2 <- rnorm(1000, 5, 5)
var_3 <- rnorm(1000, 6,18)
favorite_food <- c("pizza","ice cream", "sushi", "carrots", "onions", "broccoli", "spinach", "artichoke", "lima beans", "asparagus", "eggplant", "lettuce", "cucumbers")
favorite_food <- sample(favorite_food, 1000, replace=TRUE, prob=c(0.5, 0.45, 0.04, 0.001, 0.001, 0.001, 0.001, 0.001, 0.001, 0.001, 0.001, 0.001, 0.001))
response <- c("a","b")
response <- sample(response, 1000, replace=TRUE, prob=c(0.3, 0.7))
data = data.frame( var_1, var_2, var_3, favorite_food, response)
data$favorite_food = as.factor(data$favorite_food)
data$response = as.factor(data$response)
From here, I want to make histograms for the two categorical variables in this data set and put them on the same page:
#make histograms and put them on the same page (note: I don't know why the "par(mfrow = c(1,2))" statement is not working)
par(mfrow = c(1,2))
histogram(data$response, main = "response"))
histogram(data$favorite_food, main = "favorite food"))
My question : Is it possibly to automatically produce histograms for all categorical variables (without manually writing the "histogram()" statement for each variable) in a given data set and print them on the same page? Is it better to the use the "ggplot2" library instead for this problem ?
I can manually write the "histogram()" statement for each individual categorical variables in the data set, but I was looking for a quicker way to do this. Is it possible to do this with a "for loop"?
Thanks
Upvotes: 0
Views: 515
Reputation: 4497
Here is a try using cowplot
& ggplot2
library(ggplot2)
library(dplyr)
library(foreach)
library(cowplot)
list_variables <- c("response", "favorite_food")
all_plot <- foreach(current_var = c(list_variables)) %do% {
# need to do this to avoid ggplot reference to same summary data afterward.
data_summary_name <- paste0(current_var, "_summary")
eval(substitute(
{
graph_data <- data %>%
group_by(!!sym(current_var)) %>%
summarize(count = n(), .groups = "drop") %>%
mutate(share = count / sum(count))
plot <- ggplot(graph_data) +
geom_bar(mapping = aes(x = !!sym(current_var), y = share), width = 1,
fill = "#00FFFF", color = "#000000", stat = "identity") +
scale_y_continuous(labels = scales::percent) +
ggtitle(current_var) + ylab("Perecent of Total") +
theme_bw()
}, list(graph_data = as.name(data_summary_name))
))
return(plot)
}
plot_grid(plotlist = all_plot, ncol = 2)
Note: For reference about why I use eval
& substitue
you can reference to this question on ggplot2 generate same plot for different variables in a for loop
Using facet_wrap
as approach similar to QuishSwash with data calculated in share instead
list_variables <- c("response", "favorite_food")
# Calculate share for choosen variables defined in list_variables
# You can adjust by having some variables selection based on some condition
summary_df <- bind_rows(foreach(current_var = c(list_variables)) %do% {
data %>%
group_by(variable = !!sym(current_var)) %>%
summarize(count = n(), .groups = "drop") %>%
mutate(share = count / sum(count),
variable_name = current_var)
})
ggplot(summary_df) +
geom_bar(
aes(x = variable, y = share),
fill = "#00FFFF", color = "#000000", stat = "identity") +
facet_wrap(~variable_name, scales = "free") +
scale_y_continuous(labels = scales::percent) +
theme_bw()
Created on 2021-04-29 by the reprex package (v2.0.0)
Upvotes: 1
Reputation: 389135
Here's a base R alternative using barplot
in for
loop :
cols <- names(data)[sapply(data, is.factor)]
#This would need some manual adjustment if number of columns increase
par(mfrow = c(1,length(cols)))
for(i in cols) {
barplot(table(data[[i]]), main = i)
}
Upvotes: 3
Reputation: 3335
A ggplot2
/tidyverse
solution is to lengthen each column into data and then use faceting to plot them all in the same page:
(with edit to plot only factor variables)
factor_vars <- sapply(data, is.factor)
varnames <- names(data)
deselect_not_factors <- varnames[!factor_vars]
library(tidyr)
library(ggplot2)
data_long <- data %>%
pivot_longer(
cols = -deselect_not_factors,
names_to = "category",
values_to = "value"
)
ggplot(data_long) +
geom_bar(
aes(x = value)
) +
facet_wrap(~category, scales = "free")
Upvotes: 4
Reputation: 116
As an alternative, you can capitalize on the fantastic DataExplorer package.
Note that histograms are for continuous variables and hence, you wanted to create bar plots for your categorical variables. This can be done as follows:
if(require(DataExplorer)==FALSE) install.packages("DataExplorer"); library(DataExplorer)
DataExplorer::plot_histogram(data) # plots histograms for continuous variables
DataExplorer::plot_bar(data) # bar plots for categorical variables
Please refer to the package manual for more details.
Upvotes: 2