Reputation: 89
I'm generating violin plots in ggplot2 for a time series, year_1 to year_32. The years in my df are stored as numerical values. From the examples I've seen, it seems that I must convert these numerical year values to factors to plot one violin per year; and in fact, if I run the code without as.factors, I get one big fat violin. I would like to understand why geom_violin can't have numeric values on the x axis; or if I'm wrong about that, how to use them?
So:
my_data$year <- as.factor(my_data$year)
p <- ggplot(data = my_data, aes(x = year, y = continuous_var)+
geom_violin(fill = "#FF0000", color = "#000000")+
ylim(0,500)+
labs(x = "x_label", y = "y_label")
p +my_theme()
works fine, but if I skip
my_data$year <- as.factor(my_data$year)
it doesn't work, I get one big fat violin for all years. Why?
TIA
Upvotes: 1
Views: 1974
Reputation: 4456
PS: this discussion would better fit Cross Validated, as it's more of an statistics than coding question.
I'm not 100% sure, but here's my explanation: the violin plot shows the density for a set of data, you can divide your data into groups so that you can plot one violin for each part of your data. But if the metric you're using to divide groups (x axis) is a continuous, you're going to have infinite groupings (one group for the values at 0, one for 0.1, one for 0.01, etc.), so in the end you actually can't divide your data, and ggplot probably ignores the x variable and makes one violin for all your data.
Upvotes: 0
Reputation: 79311
You miss a )
at the end of this line p <- ggplot(data = my_data, aes(x = year, y = continuous_var)
I have construced a reproducible example with the ToothGrowth
dataset:
This should work now:
library(ggplot2)
my_data <- ToothGrowth
my_data$dose <- as.factor(my_data$dose)
p <- ggplot(data = my_data, aes(x = dose, y = len))+
geom_violin(fill = "#FF0000", color = "#000000")+
ylim(0,500)+
labs(x = "x_label", y = "y_label") +
theme_bw()
p
Upvotes: 1