Standard Normalization considering Skewness and Kurtosis

Question

I have a rather fundamental statistics question. I know stack-overflow might not be the perfect place for it but me being a software-developer I don't know of any good statistics forums and stack-overflow served me very well in the past.

My problem is the following. I need to standard normalize some data. I have two different sets and after my normalization they should share roughly the same distribution. I used standard normalization for that until now (Standard Score: (x - mu)/ sigma). After transforming all values of my two distributions like this I want the resulting distribution of all transform values to be pretty much identical.

This worked well so far but now I ran into the problem that one of my two distributions is skewed. Standard normalization does not account for that so after the normalization, the mean and the standard deviation might be the same but one is skewed while the other distribution is symmetric.

My question now: Is there a known way of doing a standard normalization that considers also the skewing and kurtosis for the transformation? One important thing to mention is that my values can also be negative.

I can see that this might not be the right forum so I would also be very happy if someone can point me to a credible statistics forum.

Oli

pjs · Accepted Answer

If your goal is to see if the two data sets share the same distribution, no need to do normalization. You should consider using a Q-Q plot. If the data share a common distribution, even with different parameterizations, the result will fall fairly close to a straight line.

Generating the Q-Q plot is easy when you have the same amount of data in the two sets. Sort both sets, then pair them up and plot them. If the sets are different sizes, you'll have to interpolate the quantiles for the smaller set, which is more challenging.

In your current case though, if one of the sets is skewed (based on more than just one or two outliers) and the other is symmetric, they're probably from different distributions.

If your data are normally distributed then "standardizing" yields a standard normal when the true variance is used for the transformation, and a t-distribution when the sample variance is used. However, since standardizing is a linear transformation it is shape-preserving. If your data are not normal, the standard transformation will not magically make them bell-shaped and symmetric.

The only transformation I'm aware of that reliably yields the same reference distribution is conversion to quantiles. It's a well-known result that if random variable X has invertible CDF F_X, then F_X(X) ~ U(0,1), i.e., mapping X's through their own CDF yields quantiles normalized to the range (0,1). To apply this as a transformation, you have to know the correct CDF. That's where Q-Q plots are quite clever—if two data sets have the same underlying distribution, their quantiles will line up with each other regardless of whether you know the actual distribution or not.

Bottom line: if you want to know whether your two data sets have the same distribution, use Q-Q plotting. If you want a transformation that will yield a known reference distribution for any (continuous) input distribution, you'll need to know the actual CDF involved.

Standard Normalization considering Skewness and Kurtosis

Answers (2)

Related Questions