Reputation: 65
I have a data frame with many columns. The first column contains categories such as "System 1", "System 2", and the second column has numbers that represent the 0's and 1's.
For example:
SYSTEM | Q1 | Q2 |
---|---|---|
S1 | 0 | 1 |
S1 | 1 | 0 |
S2 | 1 | 1 |
S2 | 0 | 0 |
S2 | 1 | 1 |
How to write R code to produce a violin plot for all the questions(Q1,Q2,Q3,..Q10) at the same time.
p <- ggplot(mydata, aes(x =system, y = q1, fill=system, color=system)) +
ggplot(mydata, aes(x =system, y = q2, fill=system, color=system)) +
ggplot(mydata, aes(x =system, y = q3, fill=system, color=system)) +
ggplot(mydata, aes(x =system, y = q100, fill=system, color=system)) +
geom_violin(show.legend=FALSE) +
stat_summary(fun.y=median, geom="point", size=2, color="red")+
coord_flip()
save pdf file
filename <- fs::path(knitr::fig_path(), "file.pdf")
ggsave(filename)
plot(p)
invisible(dev.off())
knitr::include_graphics(filename)
I'm new to R and I spent a very long time but it doesn't work at all.
I apply the code suggested by @Rui Barradas then I got all the plots compare to each other. Actually, what I want is to get one figure that include 2 plots for each system such system 1 then system 2 side by side and the title is Q1.
Pleas I asked for how to save pdf files for each figure by using for-loop to save 100 pdf files in a folder.
Thank you again.
Upvotes: 2
Views: 126
Reputation: 76641
This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.
Violin plots are not meant for binary data, you would have hour-glass type of plots, full at the extremes 0 and 1, slim in the middle. Try a dot plot instead.
suppressPackageStartupMessages({
library(ggplot2)
library(dplyr)
library(tidyr)
})
df2 %>%
pivot_longer(-SYSTEM, names_to = "QUESTION") %>%
group_by(SYSTEM, QUESTION) %>%
summarise(Count = sum(value), .groups = "drop") %>%
ggplot(aes(SYSTEM, QUESTION)) +
geom_point(aes(size = Count, color = Count)) +
guides(color = guide_legend()) +
labs(size = "Count", color = "Count") +
theme_bw()
Created on 2022-09-17 with reprex v2.0.2
A violin plot would look like the following.
suppressPackageStartupMessages({
library(ggplot2)
library(dplyr)
library(tidyr)
})
df2 %>%
pivot_longer(-SYSTEM, names_to = "QUESTION") %>%
ggplot(aes(QUESTION, value)) +
geom_violin() +
facet_wrap(~ SYSTEM) +
theme_bw()
Created on 2022-09-17 with reprex v2.0.2
To have a violin plot (once again, a bad idea for binary data) of SYSTEM
vs. values, with one QUESTION
per panel, just swap the relevant variables in the code above, SYSTEM
and QUESTION
.
I have also edited the y axis label, "Distribution"
is more descriptive of what the plot is representing.
df2 %>%
pivot_longer(-SYSTEM, names_to = "QUESTION") %>%
ggplot(aes(SYSTEM, value)) +
geom_violin() +
ylab("Distrubution") +
facet_wrap(~ QUESTION) +
theme_bw()
Created on 2022-09-18 with reprex v2.0.2
Here I make the posted data bigger.
x<-"SYSTEM Q1 Q2
S1 0 1
S1 1 0
S2 1 1
S2 0 0
S2 1 1"
df1 <- read.table(textConnection(x), header = TRUE)
n <- 1e4
df2 <- data.frame(
SYSTEM = sample(df1$SYSTEM, n, TRUE),
Q1 = sample(df1$Q1, n, TRUE),
Q2 = sample(df1$Q2, n, TRUE)
)
Created on 2022-09-17 with reprex v2.0.2
Upvotes: 3