Reputation: 15803
I'd like to use np.percentile
to get a different quantile for each row.
For example, given this 2-row array, I'd like to get the 20th percentile for the first row and the 60th percentile for the second.
dat = np.array([[1, 10, 3], [4, -1, 5]])
dat
# array([[ 1, 10, 3],
# [ 4, -1, 5]])
Starting with the 20th percentile:
np.percentile(dat, 0.2, axis=1)
# array([ 1.008, -0.98 ])
And the 60th:
np.percentile(dat, 0.6, axis=1)
# array([ 1.024, -0.94 ])
Based on this, the ideal result would be [1.008, -0.94]
.
Passing a vector as the quantile expands the result to an n
xn
array:
np.percentile(dat, [0.2, 0.6], axis=1)
# array([[ 1.008, -0.98 ],
# [ 1.024, -0.94 ]])
The diagonal of this result produces the correct result:
np.percentile(dat, [0.2, 0.6], axis=1).diagonal()
# array([ 1.008, -0.94 ])
But this is prohibitively costly for larger arrays. Is there a way to directly compute the percentiles with a corresponding quantile for each row?
Upvotes: 1
Views: 2626
Reputation: 36249
If there are no conflicts with data types you could concatenate the percentiles and the data and then use np.apply_along_axis
in order to separate percentile from data:
def percentile_qarray_np(dat, q):
return np.apply_along_axis(
lambda x: np.percentile(x[1:], x[0]),
1,
np.concatenate([np.array(q)[:, np.newaxis], dat], axis=1)
)
For example:
n = 10
percentiles = np.linspace(0, 100, n)
a = np.arange(n**2).reshape(n, n)
print(percentile_qarray_np(a, percentiles))
This is now in the synthimpute
package.
Upvotes: 1
Reputation: 15803
You can use apply
after turning the array into a DataFrame
with the desired quantile as a column:
def percentile_qarray_df(dat, q):
# dat: numpy array.
# q: Vector with the same number of rows as dat.
df = pd.DataFrame(dat)
df['q'] = q
return df.apply(lambda x: np.percentile(x.drop('q'), x.q), axis=1)
For example:
percentile_qarray_df(dat, [0.2, 0.6])
# 0 1.008
# 1 -0.940
# dtype: float64
This is still pretty slow though.
Upvotes: 0