ojp
ojp

Reputation: 1033

Scale a 3d numpy array column wise along axis of first dimension

I have a 3d numpy array representing time series data, ie [number of samples, time steps, features].

I would like to scale each feature between -1 and 1. However, each feature should be scaled with respect to the maximum and minimum of all samples in the first dimension of my array. For example, my array is of shape:

multi_data.shape
(66, 5004, 2)

I tried the following:

data_min = multi_data.min(axis=1, keepdims=True)
data_max = multi_data.max(axis=1, keepdims=True)
multi_data = (2*(multi_data-data_min)/(data_max-data_min))-1

The problem is this scales each "batch" (the first dimension of my array) independently. What I am trying to do is scale each feature (for which I have two), by the max and min values across all 66 batches and then scale each feature based on those maximum and minimum values, but I can't quite work out how to achieve this. Any pointers would be very welcome.

Upvotes: 3

Views: 552

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

How about chaining that with another min/max:

data_min = multi_data.min(axis=1, keepdims=True).min(axis=0, keepdims=True)
data_max = multi_data.max(axis=1, keepdims=True).max(axis=0, keepdims=True)
multi_data = (2*(multi_data-data_min)/(data_max-data_min))-1

Or:

data_min = multi_data.min(axis=(0,1), keepdims=True)
data_max = multi_data.max(axis=(0,1), keepdims=True)
multi_data = (2*(multi_data-data_min)/(data_max-data_min))-1

Since you're taking min/max of the first two dimensions, you can just forget keepdims and use broadcasting so you can save quite a bit of memory in this case:

data_min = multi_data.min(axis=(0,1))
data_max = multi_data.max(axis=(0,1))
multi_data = (2*(multi_data-data_min)/(data_max-data_min))-1

Upvotes: 4

Related Questions