Reputation: 491
I have issues to adapt a ggplot in plotly. The boxes are properly grouped in the ggplot boxplot but are overlapping in the plotly graph.
data(iris)
melted <- melt(iris,id.vars="Species")
melted <- subset(melted,variable %in% c("Sepal.Length","Sepal.Width"))
p <- ggplot(melted,aes(x=variable,y=value,fill=Species)) +
geom_boxplot(aes(fill=Species,color=Species))
p
ggplotly(p)
Upvotes: 1
Views: 1106
Reputation: 1236
I was not sure what you meant by "The separate boxes are not identical". When I replicated your example I noticed two issues:
First issue:
it seems (from the links below) that you can group them by just changing your last line to:
ggplotly(p) %>% layout(boxmode='group')
In my computer (R version 3.5.3, ggplot2 3.1.0, plotly 4.8.0) this is throwing a warning saying that "'layout' objects don't have these attributes: 'boxmode'". But the third link in the list says that you can just ignore it
Second issue:
This one I noticed only after posting the first answer. Apparently the geom_box
function from ggplot uses a different definition for the limits of the boxes and whiskers. Take a look at ? geom_boxplot
. It says
The lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles). This differs slightly from the method used by the boxplot() function, and may be apparent with small samples. See boxplot.stats() for for more information on how hinge positions are calculated for boxplot().
The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles). The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge. Data beyond the end of the whiskers are called "outlying" points and are plotted individually.
When i used boxplot()
to create the graph, I got something similar to the boxplot in plotly. So, plotly appears to use the same method as boxplot()
to create the graph, which appears to be slightly different from geom_boxplot()
. I am not sure, but I think it can be a matter of one of them using <
and the other <=
to interpret "largest value no further than 1.5 * IQR".
Hope this helps.
Upvotes: 3