Reputation: 3790
I am interested in Applying Henze-Zirkler's Multivariate Normality Test in python 3x and I was wondering if I may do so in python in Jupyter notebook.
I have fitted a VAR model with my data and the then I would like to test whether the residuals from this fitted VAR model are normally distributed.
How may I do so in Jupyter notebook using python?
Upvotes: 3
Views: 2157
Reputation: 3790
This is another answer since I discover this method later. If you do not want to import the library of R into Python. One may call the output of R to python. i.e. one is capable of activating R function through python as follow:
import rpy2.robjects as robjects
from rpy2.robjects import r
from rpy2.robjects.numpy2ri import numpy2ri
from rpy2.robjects.packages import importr
import numpy as np
suppose that resi is a Dataframe in python say
# Create data
resi = pd.DataFrame(np.random.random((108, 2)), columns=['Number1','Number2'])
Then the code is as follow
#Converting the dataframe from python to R
# firt take the values of the dataframe to numpy
resi1=np.array(resi, dtype=float)
# Taking the variable from Python to R
r_resi = numpy2ri(resi1)
# Creating this variable in R (from python)
r.assign("resi", r_resi)
# Calling libraries in R
r('library("MVN")')
# Calling a function in R (from python)
r("res <- hzTest(resi, qqplot = F)")
# Retrieving information from R to Python
r_result = r("res")
# Printing the output in python
print(r_result)
This will generate the output:
Henze-Zirkler's Multivariate Normality Test
---------------------------------------------
data : resi
HZ : 2.841424
p-value : 1.032563e-06
Result : Data are not multivariate normal.
---------------------------------------------
Update per 2021-08-25 There has been some API changes both to the MVN package and ro rpy2. The following works with MVN version 5.9 and rpy2 version 3.4.
"""Interface file to access the R MVN package"""
import numpy as np
import rpy2.robjects.packages as rpackages
from rpy2.robjects import numpy2ri
from rpy2.robjects.packages import importr
from rpy2.robjects.vectors import StrVector
# Install packages, if they are not already installed
packages_to_install_if_needed = ("MVN",)
utils = rpackages.importr("utils")
utils.chooseCRANmirror(ind=1) # select the first mirror in the list
names_to_install = [x for x in packages_to_install_if_needed if not rpackages.isinstalled(x)]
if len(names_to_install) > 0:
utils.install_packages(StrVector(names_to_install))
# load the package
mvn = importr("MVN")
# Generate data
np_arr = np.random.multivariate_normal(np.ones(2), np.eye(2), size=100)
# activate automatic conversion from numpy to rpy2 interface objects
numpy2ri.activate()
# perform the work
res = mvn.mvn(np_arr)
print(res)
outputting
$multivariateNormality
Test HZ p value MVN
1 Henze-Zirkler 0.3885607 0.8343017 YES
$univariateNormality
Test Variable Statistic p value Normality
1 Anderson-Darling Column1 0.2443 0.7569 YES
2 Anderson-Darling Column2 0.3935 0.3692 YES
$Descriptives
n Mean Std.Dev Median Min Max 25th 75th
1 100 0.9619135 1.0353688 1.0222279 -1.994833 3.679615 0.2696537 1.758255
2 100 0.7664778 0.9134449 0.8121996 -1.568635 2.648268 0.2068718 1.418113
Skew Kurtosis
1 -0.2123274 -0.16171832
2 -0.3718904 -0.05279222
Upvotes: 4
Reputation: 493
There is an open source Python package called Pingouin that provides Henze-Zirkler multivariate normality test and is tested against R's MVM.
https://pingouin-stats.org/generated/pingouin.multivariate_normality.html
Example extracted from the docs:
import pingouin as pg
data = pg.read_dataset('multivariate')
X = data[['Fever', 'Pressure', 'Aches']]
pg.multivariate_normality(X, alpha=.05)
>>> HZResults(hz=0.5400861018514641, pval=0.7173686509624891, normal=True)
Upvotes: 3
Reputation: 3790
There is a package in R that already does this test and it is called MVN
The first thing you have to do is to import MVN into python as described in here
Then go to your jupyter notebook and fit the VAR(1) model to your data as so
# Fit VAR(1) Model
results = Model.fit(1)
results.summary()
Store the residuals as resi
resi=results.resid
Then
# Call function from R
import os
os.environ['R_USER'] = '...\Lib\site-packages\rpy2'
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()
from rpy2.robjects.packages import importr
MVN = importr("MVN", lib_loc = "C:/.../R/win-library/3.3")
After importing MVN you can simply do the normality test as so
MVNresult =MVN.hzTest(resi, qqplot = 0)
If you press on
type(MVNresult)
you will find that it is an
rpy2.robjects.methods.RS4
Therefore, in this case you will find this link a very powerful in explaining the details
Then afterwards
tuple(MVNresult.slotnames())
This will show you the observations
('HZ', 'p.value', 'dname', 'dataframe')
Then you may get the values as so
np.array(MVNresult.slots[tuple(MVNresult.slotnames())[i]])[0]
where i
stands for 0, 1, 2, 3
as 'HZ', 'p-value',...
So in case the p-value i.e. i=1
is less than 0.05 then residuals (resi) are not multivariate normal at 5% confidence level.
Upvotes: 2