Reputation: 3713
I am using sparse matrices as a mean of compressing data, with loss of course, what I do is I create a sparse dictionary from all the values greater than a specified treshold. I'd want my compressed data size to be a variable which my user can choose.
My problem is, I have a sparse matrix with alot of near-zero values, and what I must do is choose a treshold so that my sparse dictionary is of a specific size (or eventually that the reconstruction error is of a specific rate) Here's how I create my dictionary (taken from stackoverflow I think >.< ):
n = abs(smat) > treshold #smat is flattened(1D)
i = mega_range[n] #mega range is numpy.arange(smat.shape[0])
v = smat[n]
sparse_dict = dict(izip(i,v))
How can I find treshold so that it is equal to the nth greatest value of my array (smat)?
Upvotes: 2
Views: 613
Reputation: 879093
scipy.stats.scoreatpercentile(arr,per)
returns the value at a given percentile:
import scipy.stats as ss
print(ss.scoreatpercentile([1, 4, 2, 3], 75))
# 3.25
The value is interpolated if the desired percentile lies between two points in arr
.
So if you set per=(len(smat)-n)/len(smat)
then
threshold = ss.scoreatpercentile(abs(smat), per)
should give you (close to) the nth greatest value of the array smat.
Upvotes: 2