Reputation: 481
I am trying to find an elegant way to calculate a bivariate normal CDF with python where one upper bound of the CDF is a function of two variables, of which one is a variable of the bivariate normal density (integral variable).
Example:
from scipy import integrate
import numpy as np
# First define f(x, y) as the bivariate normal distribution with fixed correlation p
p = 0.4
def f(x, y):
Q = x**2 + y**2 - 2*p*x*y
return 1/(2*np.pi*np.sqrt(1-p**2))*np.exp(-1/(2*(1-p**2))*Q)
# Define N2(a, b) as the cumulative bivariate normal distribution f where a is constant
# and b is a function of two variables
def N2(a, b):
prob, error = integrate.dblquad(f, np.NINF, a, lambda x: np.NINF, b)
return prob
# Upper bound function of 2 variables example where x is an integral variable
def upper_bound(x, v):
return 0.5*v*x
# My approach which doesn't work
# Calculate bivariate normal CDF for different values of v in the upper bound
results = [N2(1, upper_bound(x, v)) for v in np.linspace(0.01, 4, 200)]
Any ideas on how I could change my approach so the call to upper_bound(x, v)
in
results = [N2(1, upper_bound(x, v)) for v in np.linspace(0.01, 4, 200)]
would work? Other approaches to tackle the problem also welcome.
Edit: This is the integral I want to compute, where f(x,y) is the bivariate normal density function. Note that the actual upper bound f(x,v) = 0.5*v*x I want to compute is way more complicated, this is just as an example, therefore I do not want to compute it symbolically, for instance with sympy. Also, my goal is to compute the integral for a few hundreds different values of v.
Upvotes: 2
Views: 2645
Reputation: 2822
I had to write an option model that was using a bivariate distribution in Python. However, I did not find a prebuilt function that was fast - some seem to be using the random scipy generator to emulate it with the multivariate function. BUT... if you really dig deep and see what the other financial packages are using, it's pretty much ALL code written by one guy, Alan Genz from the University of Washington. He pretty much writes everything in Fortan or MATLAB, and that's it. So if you look into packages that have the bivariate CDF you'll find his name and his code there (I found an old version in MATLAB actually). http://www.math.wsu.edu/faculty/genz/software/software.html
So why this isn't built into SciPy or NumPy already, I have no idea. But I rewrote it in several hours, using both MATLAB and Python to check the resulting code. He's solving with Gauss Legendre quadrature so this will always be much faster than a solution that uses a random number generator: https://en.wikipedia.org/wiki/Gauss%E2%80%93Legendre_quadrature
# Alan Genz - original MATLAB code converted by Matt Slezak to Python
# http://www.math.wsu.edu/faculty/genz/software/matlab/bvnl.m
# Department of Mathematics
# Washington State University
# This is the bivariate CDF distribution
#
# dh 1st upper integration limit
# dk 2nd upper integration limit
# r correlation coefficient
import numpy as np
from scipy.stats import norm
def bivariate_cdf(dh, dk, r):
# dh and dk get signs flipped right away
dh = float(-dh)
dk = float(-dk)
if (dh == np.inf) or (dk == np.inf):
return 0
else:
if dh == - np.inf:
if dk == - np.inf:
return 1
else:
return norm.cdf(- dk)
if dk == - np.inf:
return norm.cdf(- dh)
else:
if r == 0:
return norm.cdf(- dh)*norm.cdf(- dk)
else:
tp=2*np.pi
h=dh
k=dk
hk=h*k
bvn=0
if abs(r) < 0.3:
w=np.array([0.1713244923791705,0.3607615730481384,0.4679139345726904])
x=np.array([0.9324695142031522,0.6612093864662647,0.238619186083197])
else:
if abs(r) < 0.75:
w=np.array([0.04717533638651177,0.1069393259953183,0.1600783285433464,0.2031674267230659,0.2334925365383547,0.2491470458134029])
x=np.array([0.9815606342467191,0.904117256370475,0.769902674194305,0.5873179542866171,0.3678314989981802,0.1252334085114692])
else:
w=np.array([0.01761400713915212,0.04060142980038694,0.06267204833410905,0.08327674157670475,0.1019301198172404,0.1181945319615184,0.1316886384491766,0.1420961093183821,0.1491729864726037,0.1527533871307259])
x=np.array([0.9931285991850949, 0.9639719272779138, 0.9122344282513259,0.8391169718222188, 0.7463319064601508, 0.6360536807265150,0.5108670019508271,0.3737060887154196,0.2277858511416451,0.07652652113349733])
w = np.concatenate((w, w), axis=0)
x = np.concatenate((1 - x, 1 + x), axis=0)
if abs(r) < 0.925:
hs=(h*h+k*k) / 2
asr=np.arcsin(r) / 2
sn=np.sin(asr*x)
bvn=np.dot(np.exp((sn*hk-hs)/(1-sn**2)),w.T)
bvn=bvn*asr/tp + norm.cdf(-h)*norm.cdf(-k)
else:
if r < 0:
k=- k
hk=- hk
if abs(r) < 1:
as1=1 - r ** 2
a=np.sqrt(as1)
bs=(h - k) ** 2
asr=- (bs / as1 + hk) / 2
c=(4 - hk) / 8
d=(12 - hk) / 80
if asr > - 100:
bvn= a*np.exp(asr)*(1-c*(bs-as1)*(1-d*bs)/3+c*d*as1**2)
if hk > - 100:
b=np.sqrt(bs)
sp=np.sqrt(tp)*norm.cdf(-b/a)
bvn=bvn - np.exp(-hk/2)*sp*b*( 1 - c*bs*(1-d*bs)/3 )
a=a / 2
xs=(a*x) ** 2
asr=- (bs / xs + hk) / 2
ix=asr > - 100
xs=xs[ix]
sp=( 1 + c*xs*(1+5*d*xs) )
rs=np.sqrt(1 - xs)
ep=np.exp(- (hk / 2)*xs / (1 + rs) ** 2) / rs
bvn=( a*( np.dot((np.exp(asr[ix])*(sp-ep)),w[ix].T) ) - bvn )/tp
if r > 0:
bvn=bvn + norm.cdf(- max(h,k))
else:
if h >= k:
bvn=- bvn
else:
if h < 0:
L=norm.cdf(k) - norm.cdf(h)
else:
L=norm.cdf(- h) - norm.cdf(- k)
bvn=L - bvn
return max(0,min(1,bvn))
Upvotes: 3
Reputation: 21643
Although it's slow this approach seems to work.
The first few lines, up to 'this should produce 1', are a sanity check. I wanted to verify that my approach would correctly calculate the volume under the density. It does.
I use a variance-covariance matrix to get the desired correlation of 0.4 and avoid writing my own pdf.
I curry functions in two places so that functions have only single parameters. This makes it possible to calculate the inner integral as a function of x. It also makes it possible to take the v parameter 'outside' the other calculations.
from toolz import curry
from scipy.stats import multivariate_normal
from scipy.integrate import quad
import numpy as np
@curry
def bivariate(x,y):
return multivariate_normal.pdf([x,y],cov=[[1,.4],[.4,1]])
def out_y(x):
marginal = bivariate(x)
return quad(marginal, np.NINF, np.PINF)[0]
# this should produce 1
print (quad(out_y, np.NINF, np.PINF)[0])
# now to what the OP wants
@curry
def inner_integral(v,x):
marginal = bivariate(x)
return quad(marginal, np.NINF, 0.5*v*x)[0]
inner_integral_for_one_v = inner_integral(0.8)
print (quad(inner_integral_for_one_v,np.NINF, 1)[0])
To use this code you would write something equivalent to:
for v in range(0,1,0.1):
inner_integral_for_one_v = inner_integral(v)
print (quad(inner_integral_for_one_v,np.NINF, 1)[0])
Upvotes: 1