VincentN
VincentN

Reputation: 111

Getting a curve (not normal distributed) on top of histogram

I am trying to get a curve on top of a histogram; however, the curve somehow started at y=0 and x at some negative value, but it needs to begin at x=0 where it has the highest frequency.

These are the values of data:

 [1] 0.41645505 0.17807010 0.04401494 0.00000000 0.53424325 0.00000000 0.78833026 0.14429310 0.00000000 0.35345068 0.00000000 0.00000000
[13] 0.03157549 0.00000000 0.00000000 0.83979615 0.15510495 0.00000000 0.00000000 0.38146542 0.60273251 0.28437203 0.00000000 0.00000000
[25] 0.63672858 0.00000000 0.28479730 0.00000000 0.73017781 0.39795789 0.00000000 0.00000000 0.56448031 0.00000000 0.92790850 0.00000000
[37] 0.00000000 0.46136357 0.27828194 0.00000000 0.01385383 0.36895497 0.06200592 0.00000000 0.17517336 0.57521911 0.00000000 0.32508820
[49] 0.00000000 0.00000000
hist(data)

The histogram that is produced is fine. However, when I tried to plot a curve on top:

plot(density(data))

it produced a plot which started from (-0.2, 0), but there is no value in data which is negative.

I want a curve/line on the top of the bars in the histogram.

Upvotes: 0

Views: 172

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 226182

tl;dr use from=0 in your density statement to restrict the range. (Don't forget to use freq=FALSE or prob=TRUE in your histogram to scale the histogram to densities rather than counts.)

Data:

dat  <- c(0.41645505,0.17807010,0.04401494,0.00000000, 0.53424325,
          0.00000000,0.78833026,0.14429310,0.00000000,0.35345068,
          0.00000000,0.00000000,0.03157549,0.00000000,0.00000000,
          0.83979615,0.15510495,0.00000000,0.00000000,0.38146542,
          0.60273251,0.28437203,0.00000000,0.00000000,0.63672858,
          0.00000000,0.28479730,0.00000000,0.73017781,0.39795789,
          0.00000000,0.00000000,0.56448031,0.00000000,0.92790850,
          0.00000000,0.00000000,0.46136357,0.27828194,0.00000000,
          0.01385383,0.36895497,0.06200592,0.00000000,0.17517336,
          0.57521911,0.00000000,0.32508820,0.00000000,0.00000000)

Using from=0 in density() tells R to start the output from 0. If you want a wigglier, less-smooth line, you can lower the adjust argument to density(). @RuiBarradas's answer shows you how to put a smooth line through the midpoints of the tops of the histogram bars - although arguably this doesn't make much theoretical sense as a way to characterize the density.

par(las=1)
hist(dat,freq=FALSE,col="gray", main="")
lines(density(dat, from=0),col=2,lwd=2)
lines(density(dat, from=0, adjust=0.25),col=4,lwd=2)

enter image description here

Upvotes: 4

Dij
Dij

Reputation: 1378

Using lattice you can find and visualize the distribution within each bin:

If your normal histogram is as follows:

dat  <- c(0.41645505,0.17807010,0.04401494,0.00000000, 0.53424325,
          0.00000000,0.78833026,0.14429310,0.00000000,0.35345068,
          0.00000000,0.00000000,0.03157549,0.00000000,0.00000000,
          0.83979615,0.15510495,0.00000000,0.00000000,0.38146542,
          0.60273251,0.28437203,0.00000000,0.00000000,0.63672858,
          0.00000000,0.28479730,0.00000000,0.73017781,0.39795789,
          0.00000000,0.00000000,0.56448031,0.00000000,0.92790850,
          0.00000000,0.00000000,0.46136357,0.27828194,0.00000000,
          0.01385383,0.36895497,0.06200592,0.00000000,0.17517336,
          0.57521911,0.00000000,0.32508820,0.00000000,0.00000000)
dat.hist <- hist(dat, breaks =6, border = "white", col="gray",main = "")
plot(dat.hist)

enter image description here

You can visualize the distribution within each bin using:

library(lattice)
lattice::densityplot( ~ dat | cut(dat, breaks = dat.hist$breaks),
                      layout = c(5, 1))

enter image description here

Upvotes: 0

Related Questions