user5433002
user5433002

Reputation:

Confidence Interval and Standard error of Skewness and Kurtosis

Please tell me how to calculate skewness and kurtosis along with their respective standard error and confidence interval associated with it(i.e. SE of Skewness and S.E of Kurtosis) I found two packages

1) package:'measure' can only calculate skewness and kurtosis

2) package:'rela' can calcuate both skewness and kurtosis but uses bootstrap by default and no command to turn it off during the calculation.

Upvotes: 3

Views: 8779

Answers (5)

Jake Horban
Jake Horban

Reputation: 66

def skew_kurt(dataframe: pd.DataFrame) -> pd.DataFrame:

out = []
for col in dataframe:
    
    x = dataframe[col]
    sd = x.std()
    
    if sd == 0:
        out.append({name: np.nan for name in ['skew stat', 'skew se', 'kurt stat', 'kurt se']})
        continue
    
    w, m1 = len(x), x.mean()
    dif = x - m1
    m2, m3, m4 = tuple([(dif ** i).sum() for i in range(2, 5)])
    
    skew = w * m3 / (w - 1) / (w - 2) / sd ** 3
    skew_se = np.sqrt(6 * w * (w - 1) / ((w - 2) * (w + 1) * (w + 3)))
    
    kurt = (w * (w + 1) * m4 - 3 * m2 ** 2 * (w - 1)) / ((w - 1) * (w - 2) * (w - 3) * sd ** 4)
    kurt_se = np.sqrt(4 * (w ** 2 - 1) * skew_se ** 2 / ((w - 3) * (w + 5)))

    out.append({'skew stat': skew, 'skew se': skew_se, 'kurt stat': kurt, 'kurt se': kurt_se})
    
dataframe = pd.DataFrame(out, index = list(dataframe))
dataframe['skew<2'] = np.absolute(dataframe['skew stat']) < 2 * dataframe['skew se']
dataframe['kurt<2'] = np.absolute(dataframe['kurt stat']) < 2 * dataframe['kurt se']
    
return dataframe[['skew stat', 'skew se', 'skew<2', 'kurt stat', 'kurt se', 'kurt<2']]

Upvotes: 0

0range
0range

Reputation: 2158

@HBat is right: provided your sample data is Gaussian, you can compute the standard error using the equation from wikipedia

n = len(sample)
se_skew = ((6*n*(n-1))/((n-2)*(n+1)*(n+3)))**0.5

However, @BigBendRegion is also right: if your data is not Gaussian, this does not work. Then you may need to bootstrap.

R has the DescTools package that can bootstrap confidence intervals for skew (among other things). It can be included in python using rpy2 like so:

""" Import rpy2 and the relevant package"""
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
DescTools = importr('DescTools')
""" You will probably need this if you want to work with numpy arrays"""
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()


def compute_skew(data, confidence_level=0.99):
    """ Compute the skew and confidence interval using rpy2, DescTools
        @param data
        @return dict with keys: skew, skew_ci_lower, skew_ci_upper"""
    d = {}
    d["skew"], d["skew_ci_lower"], d["skew_ci_upper"] = DescTools.Skew(data, conf_level=confidence_level)
    return d

""" Call the function on your data (assuming that is saved in a variable named sample)"""
print(compute_skew(sample))

Upvotes: 1

BigBendRegion
BigBendRegion

Reputation: 220

The standard errors are valid for normal distributions, but not for other distributions. To see why, you can run the following code (which uses the spssSkewKurtosis function shown above) to estimate the true confidence level of the interval obtained by taking the kurtosis estimate plus or minus 1.96 standard errors:

set.seed(12345)
Nsim = 10000
Correct = numeric(Nsim)
b1.ols = numeric(Nsim)
b1.alt = numeric(Nsim)
for (i in 1:Nsim) {
 Data = rnorm(1000)  
 Kurt = spssSkewKurtosis(Data)[2,1]
 seKurt =  spssSkewKurtosis(Data)[2,2]
  LowerLimit = Kurt -1.96*seKurt
  UpperLimit = Kurt +1.96*seKurt
  Correct[i] = LowerLimit <= 0 & 0 <= UpperLimit  
 }

TrueConfLevel = mean(Correct)
TrueConfLevel

This gives you 0.9496, acceptably close to the expected 95%, so the standard errors work as expected when the data come from a normal distribution. But if you change Data = rnorm(1000) to Data = runif(1000), then you are assuming that the data come from a uniform distribution, whose theoretical (excess) kurtosis is -1.2. Making the corresponding change from Correct[i] = LowerLimit <= 0 & 0 <= UpperLimit to Correct[i] = LowerLimit <= -1.2 & -1.2 <= UpperLimit gives the result 1.0, meaning that the 95% intervals were always correct, rather than correct for 95% of the samples. Hence, the standard error seems to be overestimated (too large) for the (light-tailed) uniform distribution.

If you change Data = rnorm(1000) to Data = rexp(1000), then you are assuming that the data come from an exponential distribution, whose theoretical (excess) kurtosis is 6.0. Making the corresponding change from Correct[i] = LowerLimit <= 0 & 0 <= UpperLimit to Correct[i] = LowerLimit <= 6.0 & 6.0 <= UpperLimit gives the result 0.1007, meaning that the 95% intervals were correct only for 10.07% of the samples, rather than correct for 95% of the samples. Hence, the standard error seems to be underestimated (too small) for the (heavy-tailed) exponential distribution.

Those standard errors are grossly incorrect for non-normal distributions, as the simulation above shows. Thus, the only use of those standard errors is to compare the estimated kurtosis with the expected theoretical normal value (0.0); e.g., using a test of hypothesis. They cannot be used to construct a confidence interval for the true kurtosis.

Upvotes: 4

HBat
HBat

Reputation: 5702

I'm simply copying and pasting the code published by Howard Seltman in here:

# Skewness and kurtosis and their standard errors as implement by SPSS
#
# Reference: pp 451-452 of
# http://support.spss.com/ProductsExt/SPSS/Documentation/Manuals/16.0/SPSS 16.0 Algorithms.pdf
# 
# See also: Suggestion for Using Powerful and Informative Tests of Normality,
# Ralph B. D'Agostino, Albert Belanger, Ralph B. D'Agostino, Jr.,
# The American Statistician, Vol. 44, No. 4 (Nov., 1990), pp. 316-321

spssSkewKurtosis=function(x) {
  w=length(x)
  m1=mean(x)
  m2=sum((x-m1)^2)
  m3=sum((x-m1)^3)
  m4=sum((x-m1)^4)
  s1=sd(x)
  skew=w*m3/(w-1)/(w-2)/s1^3
  sdskew=sqrt( 6*w*(w-1) / ((w-2)*(w+1)*(w+3)) )
  kurtosis=(w*(w+1)*m4 - 3*m2^2*(w-1)) / ((w-1)*(w-2)*(w-3)*s1^4)
  sdkurtosis=sqrt( 4*(w^2-1) * sdskew^2 / ((w-3)*(w+5)) )
  mat=matrix(c(skew,kurtosis, sdskew,sdkurtosis), 2,
        dimnames=list(c("skew","kurtosis"), c("estimate","se")))
  return(mat)
}

To get skewness and kurtosis of a variable along with their standard errors, simply run this function:

x <- rnorm(100)
spssSkewKurtosis(x)

##             estimate    se
##    skew       -0.684 0.241
##    kurtosis    0.273 0.478

Upvotes: 5

Zahiro Mor
Zahiro Mor

Reputation: 1718

try package psych:

> a <- data.frame(cola=rep(c('A','B','C'),100),colb=sample(1:1000,300),colc=rnorm(300))
> describe(a)
      vars   n   mean     sd median trimmed    mad   min    max  range  skew kurtosis    se
cola*    1 300   2.00   0.82   2.00    2.00   1.48  1.00   3.00   2.00  0.00    -1.51  0.05
colb     2 300 511.76 285.59 506.50  514.21 362.50  1.00 999.00 998.00 -0.04    -1.17 16.49
colc     3 300   0.12   1.04   0.05    0.10   1.07 -2.54   2.91   5.45  0.12    -0.24  0.06
> describe(a)$skew
[1]  0.00000000 -0.04418551  0.11857609

Upvotes: 0

Related Questions