Azz
Azz

Reputation: 41

plot a clear graph to show the skewness and kurtosis

Im trying to understand the skewness and kurtosis of a numeric variable, to understand the shape of the data.

I calculate first with the skewness command like this:

skewness(data$responsetime)
[1] 26.56731

And the kurtosis:

 kurtosis(data$responsetime)
[1] 3723.961

The skewness is positive so the tail should go the the right, and kurtosis is >= 3.

Now I would like to confirm both the skewness and the kurtosis with a plot. I try that like this:

plot(density(data$responsetime))

And Im getting a plot like below that its difficult to get some conclusion. Im new to R and Im trying to get this graph more clear, like adjusting the x size or something, but Im not finding the command to do that. Do oyu know how to do that?

enter image description here

Using a histogram, like this:

hist(data$responsetime, breaks=100)

I also get a graph difficult to understand:

enter image description here

With plot(data$responsetime, xlim=c(0, 20000)) I get this:

enter image description here

With: plot(density(data$responsetime), xlim=c(0, 20000))

I get the graph below. But I dont understand, in the x axis I have the response time. The maximum value in response time with max(data$responsetime) is 320000, so how the tail stops arround 18000?

enter image description here

Upvotes: 0

Views: 14404

Answers (2)

BigBendRegion
BigBendRegion

Reputation: 220

Use qqnorm along with qqline - that shows both skewness and kurtosis very clearly.

code:

qqnorm(data$responsetime)

qqline(data$responsetime)

Right skew typically exhibits a convex appearance, left skew typically concave. With excess kurtosis <0, typically the tails are closer to the horizontal mid-line than the qqline predicts; with excess kurtosis >0, typically one or both of the tails is more extreme (farther away from the horizontal mid-line) than the qqline predicts.

You should see a concave appearance in the qq-plot of your data, with the right tail much above the qqline. This indicates that your distribution produces outliers greatly in excess of what is predicted by the normal distribution in the right tail.

Kurtosis measures outliers, not the peak of the distribution. That might be a source of confusion for some people when it comes to relating the kurtosis statistic to the histogram.

The logic to understand why kurtosis measures outliers (not peak) is simple: Large |Z|-values indicate outliers. Kurtosis is the average of the Z^4 values. So |Z|-values close to zero (where the peak is) contribute virtually nothing to the kurtosis statistic, and thus the kurtosis statistic is non-informative about the peak. You can have a high kurtosis when the peak is pointy and you can have a high kurtosis when the peak is flat. It all depends on the disposition of the outliers.

Upvotes: 3

Puddlebunk
Puddlebunk

Reputation: 493

relating to the hist() function:

hist(data$responsetime, breaks='FD')

I have found "breaks='FD'" usually returns enough break points in the histogram to solve this issue. Also, from the graph it looks like you do have a very long tail.

Side bar: If you data are that skewed you may consider transforming the data before working with them.

Upvotes: 0

Related Questions