Arbitrary distribution - Uniform distribution (Probability Integral Transform?)

Question

I have 500,000 values for a variable derived from financial markets. Specifically, this variable represents distance from the mean (in standard deviations). This variable has a arbitrary distribution. I need a formula that will allow me to select a range around any value of this variable such that an equal (or close to it) amount of data points fall within that range.

This will allow me to then analyze all of the data points within a specific range and to treat them as "similar situations to the input."

From what I understand, this means that I need to convert it from arbitrary distribution to uniform distribution. I have read (but barely understood) that what I am looking for is called "probability integral transform."

Can anyone assist me with some code (Matlab preferred, but it doesn't really matter) to help me accomplish this?

abcd · Accepted Answer

Here's something I put together quickly. It's not polished and not perfect, but it does what you want to do.

clear
randList=[randn(1e4,1);2*randn(1e4,1)+5];
[xCdf,xList]=ksdensity(randList,'npoints',5e3,'function','cdf');
xRange=getInterval(5,xList,xCdf,0.1);

and the function getInterval is

function out=getInterval(yPoint,xList,xCdf,areaFraction)
    yCdf=interp1(xList,xCdf,yPoint);
    yCdfRange=[-areaFraction/2, areaFraction/2]+yCdf;

    out=interp1(xCdf,xList,yCdfRange);

Explanation:

The CDF of the random distribution is shown below by the line in blue. You provide a point (here 5 in the input to getInterval) about which you want a range that gives you 10% of the area (input 0.1 to getInterval). The chosen point is marked by the red cross and the interval is marked by the lines in green. You can get the corresponding points from the original list that lie within this interval as

newList=randList(randList>=xRange(1) & randList<=xRange(2));

You'll find that on an average, the number of points in this example is ~2000, which is 10% of numel(randList)

numel(newList)

ans =

        2045

enter image description here

NOTE:

Please note that this was done quickly and I haven't made any checks to see if the chosen point is outside the range or if yCdfRange falls outside [0 1], in which case interp1 will return a NaN. This is fairly straightforward to implement, and I'll leave that to you.
Also, ksdensity is very CPU intensive. I wouldn't recommend increasing npoints to more than 1e4. I assume you're only working with a fixed list (i.e., you have a list of 5e5 points that you've obtained somehow and now you're just running tests/analyzing it). In that case, you can run ksdensity once and save the result.

Arbitrary distribution -> Uniform distribution (Probability Integral Transform?)

Answers (2)

Related Questions

Arbitrary distribution -&gt; Uniform distribution (Probability Integral Transform?)

Answers (2)

Related Questions

Arbitrary distribution -> Uniform distribution (Probability Integral Transform?)