Reputation: 2394
I'm trying to create a histogram which involves a lot of repeated values in one of the cases. One of the data points is not being represented in the graph. Here is the smallest, simplest subset I could find that still reproduced my issue.
cleanVar <- c(rep(1,9),1.25,1.5)
plot_ly(data.table(cleanVar),
x = ~cleanVar,
type = "histogram")
The above graph shows only two bars. One centered at 1 of height 9, and one centered at 1.2 of height 1.
Also strangely, the hover-over shows "1" for the first bar, despite it covering the range [.9,1.1]
, and it shows "1.25" for the second bar, despite it covering the range [1.1,1.3]
.
If we change the 1 to only be repeated 8 times cleanVar <- c(rep(1,8),1.25,1.5)
, so that there are 10 total values in the histogram, it works better, but still, the three bins it creates are .25 wide according to the hover-over, yet they are only .2 wide on the graph itself.
What is plotly doing? How can I properly show 3 bins of height 9,1,1 and width .25? binning options in layout()
aren't working.
Upvotes: 2
Views: 931
Reputation: 33510
By default plotly uses the following procedure to define the bins:
start:
Sets the starting value for the x axis bins. Defaults to the minimum data value, shifted down if necessary to make nice round values and to remove ambiguous bin edges. For example, if most of the data is integers we shift the bin edges 0.5 down, so a
size
of 5 would have a defaultstart
of -0.5, so it is clear that 0-4 are in the first bin, 5-9 in the second, but continuous data gets a start of 0 and bins [0,5), [5,10) etc. Dates behave similarly, andstart
should be a date string. For category data,start
is based on the category serial numbers, and defaults to -0.5. If multiple non-overlaying histograms share a subplot, the first explicitstart
is used exactly and all others are shifted down (if necessary) to differ from that one by an integer number of bins.
end:
Sets the end value for the x axis bins. The last bin may not end exactly at this value, we increment the bin edge by
size
fromstart
until we reach or exceedend
. Defaults to the maximum data value. Likestart
, for dates use a date string, and for category dataend
is based on the category serial numbers.
You can find this information via:
library(listviewer)
schema(jsonedit = interactive())
Navigate as follows: object ► traces ► histogram ► attributes ► xbins ► start
To avoid the default behaviour just make your x
variable a factor
:
library(plotly)
library(data.table)
cleanVar <- c(rep(1, 9), 1.25, 1.5)
plot_ly(data.table(cleanVar),
x = ~factor(cleanVar),
type = "histogram")
Result:
Upvotes: 1