Reputation: 1769
I have the following vector:
> dput(x)
c(-0.355351681957187, -0.169491525423729, 0.31683516598051, 0.283387622149837,
-0.0404040404040404, 0, -0.333333333333333, 0.0235294117647059,
0, 0, 0, -0.0515442883011552, -0.0217391304347826, -0.243119266055046,
-1, -0.34239692625979, -0.378787878787879, -1.66260162601626,
0, -0.157894736842105, 0.25, -0.5, 0.801104290693729, -0.153153153153153,
0.385314991342733, 0.214285714285714, 0.133333333333333, 0.677407583111338,
0.125, 0.0152671755725191, 0.00103734439834025, 0, 0.25, -0.181818181818182,
0, 0.555555555555556, -1.2671374117353, -0.72, -0.0896999113268307,
-0.0392156862745098, 0.987184805152276, 0.986975072984505, -0.120978120978121,
-0.554949337490257, -0.333333333333333, -1030.48879660578, 0.192660550458716,
0, 0, -1.04154941234895, -0.82051282051282, -0.0282485875706215,
0.63226571767497, 0.0881147540983607, 0, 0.458823529411765, 0.338449445639583,
-5.55556433141142, 0.225536180110256, 0.249441548771407, -0.11864406779661,
-3.76193507320178, -4.75, -1.10223741454319, -0.689922480620155,
-2.04782608695652, -3.04521276595745, -0.741007194244604, 0.989690721649485,
0.314224446032881, 0.314285714285714, 0.251685393258427, -0.00608418155402266,
-0.0368893320039882, -0.00990683783542832, -0.0166666666666667,
-0.0857142857142857, 0, 0.144337527757217, 0.221153846153846,
-0.0560747663551402, 0, -1.8, 0.2243947858473, 0.166666666666667,
0, 0.0344827586206897, 0.561461794019934, 0.458333333333333,
-1.2921686746988, -1.20289855072464, -0.156601842374616, -0.144578313253012,
-0.0310077519379845, 0.163688058489033, 0.12621359223301, -481.395976223137,
0.376470588235294, -0.222222222222222, -0.209553158705701, -0.128205128205128,
0, 0.0693069306930693, 0.0293463761671854)
I plotted the density of x
, in this way:
d0<-density(x)
df_density0<-data.frame(x=d0$x,y=d0$y,stringsAsFactors = FALSE)
plot(df_density0$x,df_density0$y,type="l",col="red")
obtaining
The peak at 0 is very narrow and outside of it the curve is flat. Thus the graph turns out to be unclear. I had thought about using logarithmic scales to make the peak less steep, and improve the readability of the graph, but there are too many zeros.
Upvotes: 0
Views: 276
Reputation: 21432
The curve is so steep because you have some extreme outliers in your data. You can remove them by boxplot
ting the data and storing the result as an object (assuming that your data is called dt
):
out <- boxplot(dt) # store the boxplot as an object
out$out # inspect the outliers
[1] -1.0000000 -1.6626016 -1.2671374 0.9871848 0.9869751 -1030.4887966
[7] -1.0415494 -5.5555643 -3.7619351 -4.7500000 -1.1022374 -2.0478261
[13] -3.0452128 0.9896907 -1.8000000 -1.2921687 -1.2028986 -481.3959762
You can remove the outliers from dt
and plot again using hist
(note that freq
must be set to FALSE if you want to add a density line) as well as superimpose a density line (play around with bw
to determine the shape of the density curve):
hist(dt[!dt %in% out$out], freq = FALSE)
lines(density(dt[!dt %in% out$out], kernel="cosine", bw = 0.1))
Upvotes: 1
Reputation: 5429
How about just looking at a more relevant range of x?
x2 <- x[ x > -3 ]
d0<-density(x2)
df_density0<-data.frame(x=d0$x,y=d0$y,stringsAsFactors = FALSE)
plot(
df_density0$x,
df_density0$y,
type="l",
col="red"
)
Then add a note saying this graph doesn't account for 6 of the 104 measurements, ranging from etc.. etc..
Upvotes: 1