Scale a 3d numpy array column wise along axis of first dimension

Question

I have a 3d numpy array representing time series data, ie [number of samples, time steps, features].

I would like to scale each feature between -1 and 1. However, each feature should be scaled with respect to the maximum and minimum of all samples in the first dimension of my array. For example, my array is of shape:

multi_data.shape
(66, 5004, 2)

I tried the following:

data_min = multi_data.min(axis=1, keepdims=True)
data_max = multi_data.max(axis=1, keepdims=True)
multi_data = (2*(multi_data-data_min)/(data_max-data_min))-1

The problem is this scales each "batch" (the first dimension of my array) independently. What I am trying to do is scale each feature (for which I have two), by the max and min values across all 66 batches and then scale each feature based on those maximum and minimum values, but I can't quite work out how to achieve this. Any pointers would be very welcome.

Quang Hoang · Accepted Answer

How about chaining that with another min/max:

data_min = multi_data.min(axis=1, keepdims=True).min(axis=0, keepdims=True)
data_max = multi_data.max(axis=1, keepdims=True).max(axis=0, keepdims=True)
multi_data = (2*(multi_data-data_min)/(data_max-data_min))-1

Or:

data_min = multi_data.min(axis=(0,1), keepdims=True)
data_max = multi_data.max(axis=(0,1), keepdims=True)
multi_data = (2*(multi_data-data_min)/(data_max-data_min))-1

Since you're taking min/max of the first two dimensions, you can just forget keepdims and use broadcasting so you can save quite a bit of memory in this case:

data_min = multi_data.min(axis=(0,1))
data_max = multi_data.max(axis=(0,1))
multi_data = (2*(multi_data-data_min)/(data_max-data_min))-1

Scale a 3d numpy array column wise along axis of first dimension

Answers (1)

Related Questions