Mark
Mark

Reputation: 1769

Density plot produces too steep a curve

I have the following vector:

> dput(x)
c(-0.355351681957187, -0.169491525423729, 0.31683516598051, 0.283387622149837, 
  -0.0404040404040404, 0, -0.333333333333333, 0.0235294117647059, 
   0, 0, 0, -0.0515442883011552, -0.0217391304347826, -0.243119266055046, 
  -1, -0.34239692625979, -0.378787878787879, -1.66260162601626, 
   0, -0.157894736842105, 0.25, -0.5, 0.801104290693729, -0.153153153153153, 
   0.385314991342733, 0.214285714285714, 0.133333333333333, 0.677407583111338, 
   0.125, 0.0152671755725191, 0.00103734439834025, 0, 0.25, -0.181818181818182, 
   0, 0.555555555555556, -1.2671374117353, -0.72, -0.0896999113268307, 
  -0.0392156862745098, 0.987184805152276, 0.986975072984505, -0.120978120978121, 
  -0.554949337490257, -0.333333333333333, -1030.48879660578, 0.192660550458716, 
   0, 0, -1.04154941234895, -0.82051282051282, -0.0282485875706215, 
   0.63226571767497, 0.0881147540983607, 0, 0.458823529411765, 0.338449445639583, 
  -5.55556433141142, 0.225536180110256, 0.249441548771407, -0.11864406779661, 
  -3.76193507320178, -4.75, -1.10223741454319, -0.689922480620155, 
  -2.04782608695652, -3.04521276595745, -0.741007194244604, 0.989690721649485, 
   0.314224446032881, 0.314285714285714, 0.251685393258427, -0.00608418155402266, 
  -0.0368893320039882, -0.00990683783542832, -0.0166666666666667, 
  -0.0857142857142857, 0, 0.144337527757217, 0.221153846153846, 
  -0.0560747663551402, 0, -1.8, 0.2243947858473, 0.166666666666667, 
   0, 0.0344827586206897, 0.561461794019934, 0.458333333333333, 
  -1.2921686746988, -1.20289855072464, -0.156601842374616, -0.144578313253012, 
  -0.0310077519379845, 0.163688058489033, 0.12621359223301, -481.395976223137, 
   0.376470588235294, -0.222222222222222, -0.209553158705701, -0.128205128205128, 
   0, 0.0693069306930693, 0.0293463761671854)

I plotted the density of x, in this way:

d0<-density(x)
df_density0<-data.frame(x=d0$x,y=d0$y,stringsAsFactors = FALSE)
plot(df_density0$x,df_density0$y,type="l",col="red")

obtaining

[1]: https://i.sstatic.net/HHlCU.png

The peak at 0 is very narrow and outside of it the curve is flat. Thus the graph turns out to be unclear. I had thought about using logarithmic scales to make the peak less steep, and improve the readability of the graph, but there are too many zeros.

Upvotes: 0

Views: 276

Answers (2)

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21432

The curve is so steep because you have some extreme outliers in your data. You can remove them by boxplotting the data and storing the result as an object (assuming that your data is called dt):

out <- boxplot(dt)  # store the boxplot as an object
out$out             # inspect the outliers
 [1]    -1.0000000    -1.6626016    -1.2671374     0.9871848     0.9869751 -1030.4887966
 [7]    -1.0415494    -5.5555643    -3.7619351    -4.7500000    -1.1022374    -2.0478261
[13]    -3.0452128     0.9896907    -1.8000000    -1.2921687    -1.2028986  -481.3959762

You can remove the outliers from dt and plot again using hist(note that freqmust be set to FALSE if you want to add a density line) as well as superimpose a density line (play around with bw to determine the shape of the density curve):

hist(dt[!dt %in% out$out], freq = FALSE)
lines(density(dt[!dt %in% out$out], kernel="cosine", bw = 0.1))

enter image description here

Upvotes: 1

Sirius
Sirius

Reputation: 5429

How about just looking at a more relevant range of x?


x2 <- x[ x > -3 ]

d0<-density(x2)
df_density0<-data.frame(x=d0$x,y=d0$y,stringsAsFactors = FALSE)

plot(
    df_density0$x,
    df_density0$y,
    type="l",
    col="red"
)

enter image description here

Then add a note saying this graph doesn't account for 6 of the 104 measurements, ranging from etc.. etc..

Upvotes: 1

Related Questions