Filling 2D numpy array based on True/False value contained in another array in Python

Question

I have a 1D empty numpy array (b) of size 4 into which I want to stack columns. The value contained in the columns are dependent on another 1D numpy array (a) containing True/False bool values.

I have manage to fill it in the way I want using for loops but I think it can be done more efficiently using slices.

Here is the working code giving me the correct result:

import numpy as np
import random

b = np.empty(4, dtype=object) # The array we are trying to fill

for i in range (5):
    # a contains 4 random True/False values
    a = np.random.randint(0,2,size=(4), dtype=bool)

    # If a row is true in a then b should contain data, otherwise nan
    data =  random.random()

    iteration = 0
    for value in a :
        if b[iteration] is None: # We check if b is empty, if yes we initialize
            if (value): # If the row in a is true we fill with the value
                b[iteration]=np.array([data])
            else:
                b[iteration]=np.array([np.nan])
        else: # If b is not empty then we just stack
            if (value):
                b[iteration]=np.hstack([b[iteration],data])
            else:
                b[iteration]=np.hstack([b[iteration],np.nan])
        iteration +=1
print(b)

Output:

array([array([       nan, 0.04209371, 0.03540539,        nan, 0.59604905]),
       array([0.66677989,        nan, 0.03540539,        nan,        nan]),
       array([0.66677989, 0.04209371, 0.03540539,        nan, 0.59604905]),
       array([0.66677989, 0.04209371, 0.03540539,        nan,        nan])],
      dtype=object)

I have tried the following code using slices of numpy arrays but it gives me an error:

b = np.empty(4, dtype=object)

for i in range (5):
    a =  np.random.randint(0,2,size=(4), dtype=bool)
    data =  random.random()
    b[a] = np.vstack([b[a],np.zeros(len(b[a]))+data])
print(b)

Output:

TypeError: NumPy boolean array indexing assignment requires a 0 or 1-dimensional input, input has 2 dimensions

I am trying to find the most efficient way of solving this problem, any suggestions ?

hpaulj · Accepted Answer

I haven't tried to figure out what is wrong with your 2nd approach.

From the output, your first creates a 4 element array, where each element is a 4 element array, with a randomly placed np.nan.

Here's a direct 2d array approach to generating the same sort of array:

A 4x4 array of random floats:

In [29]: b = np.random.rand(4,4)
In [30]: b
Out[30]: 
array([[0.12820464, 0.41477273, 0.35926356, 0.15205777],
       [0.28082327, 0.76574665, 0.2489097 , 0.17054426],
       [0.20950568, 0.78342284, 0.14498205, 0.52107821],
       [0.74684041, 0.83661847, 0.29467814, 0.66062565]])

Same size boolean array:

In [31]: a = np.random.randint(0,2, size=(4,4), dtype=bool)
In [32]: a
Out[32]: 
array([[False,  True, False,  True],
       [ True,  True, False,  True],
       [False, False, False, False],
       [False, False,  True, False]])

Using a as a mask or boolean index, replace each corresponding element of b with nan:

In [33]: b[a]=np.nan
In [34]: b
Out[34]: 
array([[0.12820464,        nan, 0.35926356,        nan],
       [       nan,        nan, 0.2489097 ,        nan],
       [0.20950568, 0.78342284, 0.14498205, 0.52107821],
       [0.74684041, 0.83661847,        nan, 0.66062565]])

This is a real 2d array of floats, not an array of arrays. That object array approach works for lists, but is not quality numpy coding.

Filling 2D numpy array based on True/False value contained in another array in Python

Answers (1)

Related Questions