Manux
Manux

Reputation: 3713

Find n greatest numbers in a sparse matrix

I am using sparse matrices as a mean of compressing data, with loss of course, what I do is I create a sparse dictionary from all the values greater than a specified treshold. I'd want my compressed data size to be a variable which my user can choose.

My problem is, I have a sparse matrix with alot of near-zero values, and what I must do is choose a treshold so that my sparse dictionary is of a specific size (or eventually that the reconstruction error is of a specific rate) Here's how I create my dictionary (taken from stackoverflow I think >.< ):

n = abs(smat) > treshold #smat is flattened(1D)
i = mega_range[n] #mega range is numpy.arange(smat.shape[0])
v = smat[n]
sparse_dict = dict(izip(i,v))

How can I find treshold so that it is equal to the nth greatest value of my array (smat)?

Upvotes: 2

Views: 613

Answers (1)

unutbu
unutbu

Reputation: 879093

scipy.stats.scoreatpercentile(arr,per) returns the value at a given percentile:

import scipy.stats as ss
print(ss.scoreatpercentile([1, 4, 2, 3], 75))
# 3.25

The value is interpolated if the desired percentile lies between two points in arr.

So if you set per=(len(smat)-n)/len(smat) then

threshold = ss.scoreatpercentile(abs(smat), per)

should give you (close to) the nth greatest value of the array smat.

Upvotes: 2

Related Questions