Numpy Arrays: Extracting preferentially ordered values from array with Nans without padding?

Question

Suppose I have an array (M,N) where the values in each "column", N, represent data recordings of N different machines. Let's also imagine each "row", M, represents a unique "timestamp" where data was recorded for all of the N machines.

The array (M,N) is structured in a way so that at M = 0, this would corresp[ond to the very first "timestamp" (t0) and the row M = M (tm) represents the most recent "timestamp" recording.

Let's call this array "AX." AX[0] would yield the recorded data for N machines at the very 1st "timestamp". AX[-1] would be the most recent recordings.

Here is my array:

>>AX = np.random.randn(3, 5)

array([[ 0.53826804, -0.9450442 , -0.10279278,  0.47251871,  0.32050493],
       [-0.97573464, -0.42359652, -0.00223274,  0.7364234 ,  0.83810714],
       [-0.07626913,  0.85246932, -0.13736392, -1.39977431, -1.39882156]])

Now imagine something went wrong and data wasn't captured consistently for every machine at every "timestamp". To create an example of what the output might look like I followed the example linked below to insert Nans in random positions in the array:

Create sample numpy array with randomly placed NaNs

>>AX.ravel()[np.random.choice(AX.size, 9, replace=False)] = np.nan


array([[ 0.53826804, -0.9450442 ,         nan,  0.47251871,         nan],
       [        nan,         nan,         nan,  0.7364234 ,  0.83810714],
       [-0.07626913,         nan,         nan,         nan,         nan]])

Let's assume that I need to provide the most recent values of the recorded data. Ideally this would be as easy as referencing AX[-1]. In this particular case, I would hardly have any data since everything got screwed up.

>>AX[-1]

array([-0.07626913,         nan,         nan,         nan,         nan])

GOAL:

I realize any data is better than nothing, so I would like use the most recent value recorded for each machine. In this particular scenario, the best I could is provide an array with the values:

[-0.07626913, -0.9450442, 0.7364234, 0.83810714]

Notice column 2 of AX had no usable data, so I just skipped it's ouput.

I do not find np.arrays to be very intuitive and as I read through the documentation, I am overwhelmed by the amount of specialized functions and transforms.

My intial idea was to perhaps filter out all of the Nans to a new array (AY), and then take the last row AY[-1] (assuming this would retains its important row based ordering) Then I realized that this would be making an array with a strange shape of (I'm just using integer values here for convenience instead of AX's values):

[1,2,3],
[4,5],
[6]

Assuming that is even possible to create, taking the last "row"(?) would yield [6,5,3] and would totally mess everything up. Padding an array with values is also bad because the most recent values would be pads for 4 out of 5 data points in the most recent "timestamp" row.

Is there a way to achieve what I want in a fairly painless manner while still using the np.array stucture and avoiding dataframes and panels?

Thanks!

Numpy Arrays: Extracting preferentially ordered values from array with Nans without padding?

Answers (1)

Related Questions