Joe
Joe

Reputation: 3991

List comprehension-like approach for numpy arrays with more than one dimension

I have a list of 2d numpy arrays of the same height but not width:

list_of_arrays = [np.random.rand(3,4),np.random.rand(3,5),np.random.rand(3,6)]

I want to build a new array where each column is a random column of the arrays in my list. I can do this with a for loop, eg:

new_array = np.zeros((3,3))
for x in range(3):
    new_array[:,x] = list_of_arrays[x][:,random.randint(0,list_of_arrays[x].shape[1])]

This does not feel clean to me. I would like to use a list comprehension-like approach, eg

new_array = [list_of_arrays[x][:,random.randint(0,list_of_arrays[x].shape[1])] for x in range(3)]

Which obviously returns a list, not an array as desired. I could convert the list into an array, but that adds an extraneous intermediate. Is there a simple way to do this? Similar questions that I have seen working with 1d arrays use numpy.fromiter, but that will not work in 2 dimensions.

If anyone wants to suggest entirely different/cleaner/more efficient ways to solve this problem, that would be appreciated as well.

Upvotes: 3

Views: 7498

Answers (2)

wflynny
wflynny

Reputation: 18521

You could make your list comprehension simpler by iterating over the arrays instead of the index,

new_array = np.array([x[:,np.random.randint(0, x.shape[1])] for x in list_of_arrays]).T

In [32]: %timeit np.array([x[:,np.random.randint(0, x.shape[1])] for x in a]).T
100000 loops, best of 3: 10.2 us per loop

The transposes (.T) are because iterating through an array yields the rows, so iterating through arr.T yields the columns. Likewise, when constructing arrays, each element is considered a row, so after construction, we need to transpose it so the lists we feed the array construct are transformed to columns.

If you import the standard random module, you could do

new_array = np.array([random.choice(x.T) for x in list_of_arrays]).T

In [36]: %timeit np.array([random.choice(x.T) for x in a]).T
100000 loops, best of 3: 9.18 us per loop

which is slightly faster.

Upvotes: 2

Lee
Lee

Reputation: 31050

Could you combine the arrays into another array rather than a list?

>>> b= np.hstack((np.random.rand(3,4),np.random.rand(3,5),np.random.rand(3,6)))
>>> b.shape
(3, 15)

Then you can use broadcasting, as opposed to list comprehension, to pick random columns:

new_array=b[:,np.random.randint(0,b.shape[1],3)]

Upvotes: 0

Related Questions