Reputation: 2669
I need to create a cumulative distribution from some numbers contained in a vector. The vector counts the number of times a dot product operation occurs in an algorithm I've been given.
An example vector would be
myVector = [100 102 101 99 98 100 101 110 102 101 100 99]
I'd like to plot the probability that I have fewer than 99 dot products, against a range from 0 to 120. The built in function
Cumdist(MyVector)
Isn't appropriate as I need to plot over a wider range than cumdist currently provides.
I've tried using
plot([0 N],cumsum(myVector))
but I have multiple entries which are the same value in my vector, and I can't work out how not to double count.
Here is some python code which does what I want:
count = [x[0] for x in tests]
found = [x[1] for x in tests]
found.sort()
num = Counter(found)
freqs = [x for x in num.values()]
cumsum = [sum(item for item in freqs[0:rank+1]) for rank in xrange(len(freqs))]
normcumsum = [float(x)/numtests for x in cumsum]
tests is a list of numbers representing the number of times a dot product was done.
Here is an example of what I'm looking for:
Example cumulative distribution
Upvotes: 4
Views: 2093
Reputation: 21561
Here is how I would do it:
myVector = [100 102 101 99 98 100 101 110 102 101 100 99];
N = numel(myVector);
x = sort(myVector);
y = 1:N;
[xplot , idx] = unique(x,'last')
yplot = y(idx)/N
stairs(xplot,yplot)
%Optionally
xfull = [0 xplot 120]
yfull = [0 yplot 1]
stairs(xfull,yfull)
Upvotes: 1
Reputation: 18504
Five hours and an answer already accepted, but if you're still interested in another answer...
What you're trying to do is obtain the empirical CDF of your data. Matlab's Statistics Toolbox, which you likely have, has a function to do exactly this in a statistically careful manner: ecdf
. So all you actually need to do is
myVector = [100 102 101 99 98 100 101 110 102 101 100 99];
[Y,X] = ecdf(myVector);
figure;
plot(X,Y);
You can use stairs
instead of plot
to display the true shape of the empirical distribution.
Upvotes: 2
Reputation: 74940
To create a cumulative distribution, you cannot use cumsum
on the vector directly. Do the following instead:
sortedVector = sort(myVector(:));
indexOfValueChange = [find(diff(sortedVector));true];
relativeCounts = (1:length(sortedVector))/length(sortedVector);
plot(sortedVector(indexOfValueChange),relativeCounts(indexOfValueChange))
EDIT
If your goal is just to modify the x-range of your plot,
xlim([0 120])
should do what you need.
Upvotes: 3