Reputation: 123
I have an issue with plotting continuous distributions without sharp edges in ggplot2. I need to show two of them on one plot. Also, it does not have to be ggplot2 to achieve this result.
I am aware, that the number of data points directly influences the smoothness of the line, but it was not the case here. Below you can see some sample data (from dput)
sample.data<-list(beta.data = structure(list(cluster = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), beta.density = c(0, 3.42273368363934e-43, 8.42987148403021e-29,
2.04764468657484e-20, 1.69485562831516e-14, 6.07999638837842e-10, 2.88180370232676e-06, 0.00314064636750876, 0.954118897015866, 0, 0, 3.80101893822358e-36, 6.43342582657081e-22, 6.82956252277493e-14, 1.75291058931833e-08, 0.000131874335695378, 0.0750918340641428, 3.72532418759802, 5.05242078519544, 0), pr = c(0, 0.111111111111111, 0.222222222222222, 0.333333333333333, 0.444444444444444, 0.555555555555556, 0.666666666666667, 0.777777777777778, 0.888888888888889, 1, 0, 0.111111111111111, 0.222222222222222, 0.333333333333333, 0.444444444444444, 0.555555555555556, 0.666666666666667, 0.777777777777778, 0.888888888888889, 1)), row.names = c(NA, -20L), class = "data.frame"), beta.params = structure(list(cluster = 1:2, a = c(49, 50), b = c(2, 10), ni.beta = c(0.961,0.833), sd.beta = c(0.00072, 0.00228)), row.names = c(NA,-2L), class = "data.frame"))
Before I was using geom_col, but it discretizes values. I went with geom_area:
ggplot(sample.data$beta.data, aes(x = pr, y = beta.density)) +
geom_area(stat = "function",
fun = dbeta,
args = list(shape1 = sample.data$beta.params[1,"a"], shape2 = sample.data$beta.params[1,"b"]),
fill = "#F1C40F",
colour = "black",
alpha = 0.7) +
geom_area(stat = "function",
fun=dbeta,
args = list(shape1 = sample.data$beta.params[2,"a"], shape2 = sample.data$beta.params[2,"b"]),
fill = "#3498DB",
colour = "black",
alpha = 0.7)
I presented you the data with 10 points, but 1000 points look almost the same. It is not the case here, where even 100 points looks ok:
p = seq(0,1, length=100)
plot(p, dbeta(p, 50, 10), ylab="Density", type ="l", col=4, , lwd = 2)
Here I am attaching code to simulate the data. Oh, and these troublesome beta parameters were a = 49 and b = 2.
len <- 100
p <- seq(0,1, length.out = len)
df <- data.frame(rbind(cbind("cl" = rep(1, times = length(p)), "beta" = dbeta(p, 50, 10),"p"= p),
cbind("cl" = rep(1, times = length(p)), "beta" = dbeta(p, 40, 2),"p"= p)))
Do you have any ideas?
EDIT: The pdfs stands here for probability density functions. That is why I have not put "pdf" as a tag. My apologies for the confusion! Anyway, when I tried to print graphic to PDF file, the result was poor as well (sharp edges). But it the end, it shouldn't matter. I want to see smooth lines whatever I do (reasonably).
EDIT2 It is possible to achieve because:
library(mosaic)
theme_set(theme_bw())
xpbeta(c(0.7, 0.90), shape1 = 49, shape2 = 2)
It produces nice, smoothed beta dist with parameters (49, 2). But then again, I need to show two dists in one chart.
Upvotes: 1
Views: 179
Reputation: 123
I have found the answer. It still needs some editing (like transparency/alpha which I couldn't figure out), but in general, this is what I meant. Code:
library(mosaic)
plotDist('beta', params=list(49,2), kind='density', type = "h", col = "#3498DB", xlim = c(0,1))
plotDist('beta', params=list(50, 10), kind='density', , type = "h", col = "#F1C40F", add = TRUE)
plotDist('beta', params=list(49,2), kind='density', add = TRUE, col = "black")
plotDist('beta', params=list(50, 10), kind='density', add = TRUE, col = "black")
We can add as many distributions as we want, using "add" parameter. Parameter type = "h", is used to draw filled distribution. Without it, the only line is visible. In my answer, I draw the two lines and two filled dists. I would be really happy if someone could show a better answer, though.
EDIT:
I think I found my perfect answer!
Here is the code:
library(ggformula)
theme_set(theme_bw())
gf_dist("beta", shape1 = 49, shape2 = 2, geom = "area", alpha = 0.5, fill = "#F1C40F") %>%
gf_dist("beta", shape1 = 49, shape2 = 2) %>%
gf_dist("beta", shape1 = 50, shape2 = 10, geom = "area", alpha = 0.5, fill = "#3498DB") %>%
gf_dist("beta", shape1 = 50, shape2 = 10)
It is much faster than the previous code, parameter alpha is obvious and it is relatively easy to combine many plots! Because of transparency, you can nicely see the overlap of both distributions.
Upvotes: 1