Ælex
Ælex

Reputation: 14839

gnuplot scale plot function to same height

I am drawing distribution curves of three different datasets. They have different means and standard deviations, and thus different curves. However, the plots appear different when in the same graph.

I use the normal curve function:

std_b=0.1674
mu_b=.6058
mu_j=0.8955
std_j=0.0373
mu_s=0.9330
std_s=0.0240
normal(x,mu,sd) = (1/(sd*sqrt(2*pi)))*exp(-(x-mu)**2/(2*sd**2))
plot normal(x,mu_b,std_b) w boxes title "Boolean",\
normal(x,mu_j,std_j) w boxes title "Jaccard",\
normal(x,mu_s,std_s) w boxes title "Sorensen"

However the scale of the curves if off as seen by the difference in the Y axis. How can I scale each plot function, so that they are all at the same Y height?

enter image description here

Upvotes: 1

Views: 236

Answers (1)

Matthew
Matthew

Reputation: 7590

In general, you can't.

These are probability density functions, which means that they must be positive and they must have an area of exactly 1 under the curve (the formal definition is a little more technical, but that is the statistics 101 definition). Because of that, when you make the curve less spread out (which is what the standard deviation is measuring), in order to preserve the area, you must make the peak in the middle higher.

If it helps to visualize it, think of a finite distribution in the shape of an isosceles triangle.

Sample Distributions

Both the purple and green triangles form perfectly valid probability distributions. In the case of the purple distribution, it has a base of length 10 (from 0 to 10) and a height of 1/5, giving an area of 1. If I want to make it cover a smaller range (which again is basically what the standard deviation is doing in your normal curves), I push the sides together (in this case a length of 6 - from 2 to 8), but in order to preserve the area of 1, I have to make the triangle taller (in this case a height of 1/3). If I kept the same height, I would have less than an area of 1.

In your normal distributions, the y height is controlled by the scale in front of your exponential functions. Getting a rid of that, or setting them to be the same will make them have the same height, but they will no longer be probability distributions, as the area will not be 1. In general, for a normal distribution, the smaller the standard deviation, the taller the peak.

Upvotes: 2

Related Questions