Tom Kealy
Tom Kealy

Reputation: 2669

Creating a cumulative distribution from a vector

I need to create a cumulative distribution from some numbers contained in a vector. The vector counts the number of times a dot product operation occurs in an algorithm I've been given.

An example vector would be

myVector = [100 102 101 99 98 100 101 110 102 101 100 99]

I'd like to plot the probability that I have fewer than 99 dot products, against a range from 0 to 120. The built in function

Cumdist(MyVector)

Isn't appropriate as I need to plot over a wider range than cumdist currently provides.

I've tried using

plot([0 N],cumsum(myVector))

but I have multiple entries which are the same value in my vector, and I can't work out how not to double count.

Here is some python code which does what I want:

count = [x[0] for x in tests]
found = [x[1] for x in tests]
found.sort()
num = Counter(found)
freqs = [x for x in num.values()]
cumsum = [sum(item for item in freqs[0:rank+1]) for rank in xrange(len(freqs))]
normcumsum  = [float(x)/numtests for x in cumsum]

tests is a list of numbers representing the number of times a dot product was done.

Here is an example of what I'm looking for:

Example cumulative distribution

Upvotes: 4

Views: 2093

Answers (3)

Dennis Jaheruddin
Dennis Jaheruddin

Reputation: 21561

Here is how I would do it:

myVector = [100 102 101 99 98 100 101 110 102 101 100 99];
N = numel(myVector);
x = sort(myVector);
y = 1:N;
[xplot , idx] = unique(x,'last')
yplot = y(idx)/N
stairs(xplot,yplot)

%Optionally
xfull = [0 xplot 120]
yfull = [0 yplot 1]
stairs(xfull,yfull)

Upvotes: 1

horchler
horchler

Reputation: 18504

Five hours and an answer already accepted, but if you're still interested in another answer...

What you're trying to do is obtain the empirical CDF of your data. Matlab's Statistics Toolbox, which you likely have, has a function to do exactly this in a statistically careful manner: ecdf. So all you actually need to do is

myVector = [100 102 101 99 98 100 101 110 102 101 100 99];
[Y,X] = ecdf(myVector);
figure;
plot(X,Y);

You can use stairs instead of plot to display the true shape of the empirical distribution.

Upvotes: 2

Jonas
Jonas

Reputation: 74940

To create a cumulative distribution, you cannot use cumsum on the vector directly. Do the following instead:

sortedVector = sort(myVector(:));
indexOfValueChange = [find(diff(sortedVector));true];
relativeCounts = (1:length(sortedVector))/length(sortedVector);

plot(sortedVector(indexOfValueChange),relativeCounts(indexOfValueChange))

EDIT

If your goal is just to modify the x-range of your plot,

xlim([0 120]) 

should do what you need.

Upvotes: 3

Related Questions