Reputation: 1033
I have a 3d numpy array representing time series data, ie [number of samples, time steps, features].
I would like to scale each feature between -1 and 1. However, each feature should be scaled with respect to the maximum and minimum of all samples in the first dimension of my array. For example, my array is of shape:
multi_data.shape
(66, 5004, 2)
I tried the following:
data_min = multi_data.min(axis=1, keepdims=True)
data_max = multi_data.max(axis=1, keepdims=True)
multi_data = (2*(multi_data-data_min)/(data_max-data_min))-1
The problem is this scales each "batch" (the first dimension of my array) independently. What I am trying to do is scale each feature (for which I have two), by the max and min values across all 66 batches and then scale each feature based on those maximum and minimum values, but I can't quite work out how to achieve this. Any pointers would be very welcome.
Upvotes: 3
Views: 552
Reputation: 150735
How about chaining that with another min/max
:
data_min = multi_data.min(axis=1, keepdims=True).min(axis=0, keepdims=True)
data_max = multi_data.max(axis=1, keepdims=True).max(axis=0, keepdims=True)
multi_data = (2*(multi_data-data_min)/(data_max-data_min))-1
Or:
data_min = multi_data.min(axis=(0,1), keepdims=True)
data_max = multi_data.max(axis=(0,1), keepdims=True)
multi_data = (2*(multi_data-data_min)/(data_max-data_min))-1
Since you're taking min/max
of the first two dimensions, you can just forget keepdims
and use broadcasting so you can save quite a bit of memory in this case:
data_min = multi_data.min(axis=(0,1))
data_max = multi_data.max(axis=(0,1))
multi_data = (2*(multi_data-data_min)/(data_max-data_min))-1
Upvotes: 4