cat cat
cat cat

Reputation: 65

How to plot many columns at once in ggplot

I have a data frame with many columns. The first column contains categories such as "System 1", "System 2", and the second column has numbers that represent the 0's and 1's.

For example:

SYSTEM Q1 Q2
S1 0 1
S1 1 0
S2 1 1
S2 0 0
S2 1 1

How to write R code to produce a violin plot for all the questions(Q1,Q2,Q3,..Q10) at the same time.

p <- ggplot(mydata, aes(x =system, y = q1, fill=system, color=system)) +
ggplot(mydata, aes(x =system, y = q2, fill=system, color=system)) +
ggplot(mydata, aes(x =system, y = q3, fill=system, color=system)) +
ggplot(mydata, aes(x =system, y = q100, fill=system, color=system)) +
 geom_violin(show.legend=FALSE) +
 stat_summary(fun.y=median, geom="point", size=2, color="red")+
  coord_flip()

save pdf file 

filename <- fs::path(knitr::fig_path(),  "file.pdf")

 ggsave(filename)
 

plot(p)
invisible(dev.off())
knitr::include_graphics(filename)

I'm new to R and I spent a very long time but it doesn't work at all.

I apply the code suggested by @Rui Barradas then I got all the plots compare to each other. Actually, what I want is to get one figure that include 2 plots for each system such system 1 then system 2 side by side and the title is Q1.

Pleas I asked for how to save pdf files for each figure by using for-loop to save 100 pdf files in a folder.

Thank you again.

Upvotes: 2

Views: 126

Answers (1)

Rui Barradas
Rui Barradas

Reputation: 76641

This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.

Violin plots are not meant for binary data, you would have hour-glass type of plots, full at the extremes 0 and 1, slim in the middle. Try a dot plot instead.

suppressPackageStartupMessages({
  library(ggplot2)
  library(dplyr)
  library(tidyr)
})

df2 %>%
  pivot_longer(-SYSTEM, names_to = "QUESTION") %>% 
  group_by(SYSTEM, QUESTION) %>%
  summarise(Count = sum(value), .groups = "drop") %>% 
  ggplot(aes(SYSTEM, QUESTION)) +
  geom_point(aes(size = Count, color = Count)) +
  guides(color = guide_legend()) +
  labs(size = "Count", color = "Count") +
  theme_bw()

Created on 2022-09-17 with reprex v2.0.2


A violin plot would look like the following.

suppressPackageStartupMessages({
  library(ggplot2)
  library(dplyr)
  library(tidyr)
})

df2 %>%
  pivot_longer(-SYSTEM, names_to = "QUESTION") %>% 
  ggplot(aes(QUESTION, value)) +
  geom_violin() +
  facet_wrap(~ SYSTEM) +
  theme_bw()

Created on 2022-09-17 with reprex v2.0.2


Edit

To have a violin plot (once again, a bad idea for binary data) of SYSTEM vs. values, with one QUESTION per panel, just swap the relevant variables in the code above, SYSTEM and QUESTION.
I have also edited the y axis label, "Distribution" is more descriptive of what the plot is representing.

df2 %>%
  pivot_longer(-SYSTEM, names_to = "QUESTION") %>% 
  ggplot(aes(SYSTEM, value)) +
  geom_violin() +
  ylab("Distrubution") +
  facet_wrap(~ QUESTION) +
  theme_bw()

Created on 2022-09-18 with reprex v2.0.2


Data

Here I make the posted data bigger.

x<-"SYSTEM  Q1  Q2
S1  0   1
S1  1   0
S2  1   1
S2  0   0
S2  1   1"
df1 <- read.table(textConnection(x), header = TRUE)

n <- 1e4
df2 <- data.frame(
  SYSTEM = sample(df1$SYSTEM, n, TRUE),
  Q1 = sample(df1$Q1, n, TRUE),
  Q2 = sample(df1$Q2, n, TRUE)
)

Created on 2022-09-17 with reprex v2.0.2

Upvotes: 3

Related Questions