Density plot produces too steep a curve

Question

I have the following vector:

> dput(x)
c(-0.355351681957187, -0.169491525423729, 0.31683516598051, 0.283387622149837, 
  -0.0404040404040404, 0, -0.333333333333333, 0.0235294117647059, 
   0, 0, 0, -0.0515442883011552, -0.0217391304347826, -0.243119266055046, 
  -1, -0.34239692625979, -0.378787878787879, -1.66260162601626, 
   0, -0.157894736842105, 0.25, -0.5, 0.801104290693729, -0.153153153153153, 
   0.385314991342733, 0.214285714285714, 0.133333333333333, 0.677407583111338, 
   0.125, 0.0152671755725191, 0.00103734439834025, 0, 0.25, -0.181818181818182, 
   0, 0.555555555555556, -1.2671374117353, -0.72, -0.0896999113268307, 
  -0.0392156862745098, 0.987184805152276, 0.986975072984505, -0.120978120978121, 
  -0.554949337490257, -0.333333333333333, -1030.48879660578, 0.192660550458716, 
   0, 0, -1.04154941234895, -0.82051282051282, -0.0282485875706215, 
   0.63226571767497, 0.0881147540983607, 0, 0.458823529411765, 0.338449445639583, 
  -5.55556433141142, 0.225536180110256, 0.249441548771407, -0.11864406779661, 
  -3.76193507320178, -4.75, -1.10223741454319, -0.689922480620155, 
  -2.04782608695652, -3.04521276595745, -0.741007194244604, 0.989690721649485, 
   0.314224446032881, 0.314285714285714, 0.251685393258427, -0.00608418155402266, 
  -0.0368893320039882, -0.00990683783542832, -0.0166666666666667, 
  -0.0857142857142857, 0, 0.144337527757217, 0.221153846153846, 
  -0.0560747663551402, 0, -1.8, 0.2243947858473, 0.166666666666667, 
   0, 0.0344827586206897, 0.561461794019934, 0.458333333333333, 
  -1.2921686746988, -1.20289855072464, -0.156601842374616, -0.144578313253012, 
  -0.0310077519379845, 0.163688058489033, 0.12621359223301, -481.395976223137, 
   0.376470588235294, -0.222222222222222, -0.209553158705701, -0.128205128205128, 
   0, 0.0693069306930693, 0.0293463761671854)

I plotted the density of x, in this way:

d0<-density(x)
df_density0<-data.frame(x=d0$x,y=d0$y,stringsAsFactors = FALSE)
plot(df_density0$x,df_density0$y,type="l",col="red")

obtaining

The peak at 0 is very narrow and outside of it the curve is flat. Thus the graph turns out to be unclear. I had thought about using logarithmic scales to make the peak less steep, and improve the readability of the graph, but there are too many zeros.

Chris Ruehlemann · Accepted Answer

The curve is so steep because you have some extreme outliers in your data. You can remove them by boxplotting the data and storing the result as an object (assuming that your data is called dt):

out <- boxplot(dt)  # store the boxplot as an object
out$out             # inspect the outliers
 [1]    -1.0000000    -1.6626016    -1.2671374     0.9871848     0.9869751 -1030.4887966
 [7]    -1.0415494    -5.5555643    -3.7619351    -4.7500000    -1.1022374    -2.0478261
[13]    -3.0452128     0.9896907    -1.8000000    -1.2921687    -1.2028986  -481.3959762

You can remove the outliers from dt and plot again using hist(note that freqmust be set to FALSE if you want to add a density line) as well as superimpose a density line (play around with bw to determine the shape of the density curve):

hist(dt[!dt %in% out$out], freq = FALSE)
lines(density(dt[!dt %in% out$out], kernel="cosine", bw = 0.1))

Density plot produces too steep a curve

Answers (2)

Related Questions