Max Ghenis
Max Ghenis

Reputation: 15803

Get different quantile for each row using numpy percentile

I'd like to use np.percentile to get a different quantile for each row.

For example, given this 2-row array, I'd like to get the 20th percentile for the first row and the 60th percentile for the second.

dat = np.array([[1, 10, 3], [4, -1, 5]])
dat
# array([[ 1, 10,  3],
#        [ 4, -1,  5]])

Starting with the 20th percentile:

np.percentile(dat, 0.2, axis=1)
# array([ 1.008, -0.98 ])

And the 60th:

np.percentile(dat, 0.6, axis=1)
# array([ 1.024, -0.94 ])

Based on this, the ideal result would be [1.008, -0.94].

Passing a vector as the quantile expands the result to an nxn array:

np.percentile(dat, [0.2, 0.6], axis=1)
# array([[ 1.008, -0.98 ],
#        [ 1.024, -0.94 ]])

The diagonal of this result produces the correct result:

np.percentile(dat, [0.2, 0.6], axis=1).diagonal()
# array([ 1.008, -0.94 ])

But this is prohibitively costly for larger arrays. Is there a way to directly compute the percentiles with a corresponding quantile for each row?

Upvotes: 1

Views: 2626

Answers (2)

a_guest
a_guest

Reputation: 36249

If there are no conflicts with data types you could concatenate the percentiles and the data and then use np.apply_along_axis in order to separate percentile from data:

def percentile_qarray_np(dat, q):
  return np.apply_along_axis(
    lambda x: np.percentile(x[1:], x[0]),
    1,
    np.concatenate([np.array(q)[:, np.newaxis], dat], axis=1)
  )

For example:

n = 10
percentiles = np.linspace(0, 100, n)
a = np.arange(n**2).reshape(n, n)
print(percentile_qarray_np(a, percentiles))

This is now in the synthimpute package.

Upvotes: 1

Max Ghenis
Max Ghenis

Reputation: 15803

You can use apply after turning the array into a DataFrame with the desired quantile as a column:

def percentile_qarray_df(dat, q):
  # dat: numpy array.
  # q: Vector with the same number of rows as dat.
  df = pd.DataFrame(dat)
  df['q'] = q
  return df.apply(lambda x: np.percentile(x.drop('q'), x.q), axis=1)

For example:

percentile_qarray_df(dat, [0.2, 0.6])
# 0    1.008
# 1   -0.940
# dtype: float64

This is still pretty slow though.

Upvotes: 0

Related Questions