David Marx
David Marx

Reputation: 8558

Best practice for fancy indexing a numpy array along multiple axes

I'm trying to optimize an algorithm to reduce memory usage, and I've identified this particular operation as a pain point.

I have a symmetric matrix, an index array along the rows, and another index array along the columns (which is just all values that I wasn't selecting in the row index). I feel like I should just be able to pass in both indexes at the same time, but I find myself being forced to select along one axis and then the other, which is causing some memory issues because I don't actually need the copy of the array that's returned, just statistics I'm calculating from it. Here's what I am trying to do:

from scipy.spatial.distance import pdist, squareform
from sklearn import datasets
import numpy as np

iris = datasets.load_iris().data

dx = pdist(iris)
mat = squareform(dx)

outliers = [41,62,106,108,109,134,135]
inliers = np.setdiff1d( range(iris.shape[0]), outliers)

# What I want to be able to do:
scores = mat[inliers, outliers].min(axis=0)

Here's what I'm actually doing to make this work:

# What I'm being forced to do:
s1 = mat[:,outliers]
scores = s1[inliers,:].min(axis=0)

Because I'm fancy indexing, s1 is a new array instead of a view. I only need this array for one operation, so if I could eliminate returning a copy here or at least make the new array smaller (i.e. by respecting the second fancy index selection while I'm doing the first one instead of two separate fancy index operations) that would be preferable.

Upvotes: 4

Views: 2502

Answers (3)

alyaxey
alyaxey

Reputation: 1169

There's a better way in terms of readability:

result = mat[np.ix_(inliers, outliers)].min(0)

https://docs.scipy.org/doc/numpy/reference/generated/numpy.ix_.html#numpy.ix_

Upvotes: 3

eickenberg
eickenberg

Reputation: 14377

Try:

outliers = np.array(outliers)  # just to be sure they are arrays
result = mat[inliers[:, np.newaxis], outliers[np.newaxis, :]].min(0)

Upvotes: 1

Warren Weckesser
Warren Weckesser

Reputation: 114811

"Broadcasting" applies to indexing. You could convert inliers into column matrix (e.g. inliers.reshape(-1,1) or inliers[:, np.newaxis], so it has shape (m,1)) and index mat with that in the first column:

s1 = mat[inliers.reshape(-1,1), outliers]
scores = s1.min(axis=0)

Upvotes: 5

Related Questions