deeplearning
deeplearning

Reputation: 469

How to normalize 4D array ( not an image)?

I have a 4D array of shape (1948, 60, 2, 3) which tells the difference in end effector positions (x,y,z) over 60 time steps. The number 1948 indicates the number of samples, 60 is the number of time steps, 2 is for left_arm and right_arm, 3 denotes the x,y,z positions.

a sample of how it looks is below:

array([[[  3.93048840e-05,   7.70215296e-04,   1.13865805e-03],
        [  1.11679799e-04,  -7.04810066e-04,   1.83552688e-04]],

   [[ -6.26468389e-04,   6.86668923e-04,   1.57112482e-04],
    [  3.68164582e-04,   7.98948528e-04,   4.50642200e-04]],

   [[  2.51472961e-04,  -2.48105983e-04,   7.52486843e-04],
    [  8.99905240e-05,   1.70473461e-04,  -3.09927572e-04]],

   [[ -7.52414330e-04,   5.46782063e-04,  -3.76679264e-04],
    [ -3.12531026e-04,  -3.36585211e-04,   5.79075595e-05]],

   [[  7.69968002e-04,  -1.95524291e-03,  -8.65666619e-04],
    [  2.37583215e-04,   4.59415986e-04,   6.07292643e-04]],

   [[  1.41795261e-03,  -1.62364401e-03,  -8.99673829e-04],

I want to normalize this data as I need tot rain on a neural netowrk. How do I go about normalizing a 4D array I have an intuition for images. Can I normalize each example data or should the normalization be there for the entire 4D array?

Upvotes: 2

Views: 1199

Answers (2)

Divakar
Divakar

Reputation: 221574

The trick would be to use keepdims set as True, which lets the broadcasting happen without bothering us with the housekeeping work of extending dims. Hence, the solution for generic ndarrays that would handle generic dimension arrays would be -

# Get min, max value aming all elements for each column
x_min = np.min(x, axis=tuple(range(x.ndim-1)), keepdims=1)
x_max = np.max(x, axis=tuple(range(x.ndim-1)), keepdims=1)

# Normalize with those min, max values leveraging broadcasting
out = (x - x_min)/ (x_max - x_min)

Upvotes: 1

asakryukin
asakryukin

Reputation: 2594

First, yes you can do normalization and there is no problem with that.

Second, there is nothing special about 4D arrays. Normalization simply should be performed separately for each feature. Thus depending on the type of the normalization, you should calculate the max and min (or mean and std) values for each feature across all samples in the training set.

In your case you should decide which parts of the data refer to the same distribution. So decide on each dimension:

1) First dimension is just number of samples, so it doesn't make new distribution. Treat it as number of data entries.

2) Time step. Here you should decide: does x,y,z values have unique distribution at each of 60 timesteps? If no, treat it the same way as previous step. If yes, calculate max,min (or mean, std) for following feature, separately for each time step. (For simplicity, think like does arm at step 0 can actually have similar values to 30, 60? If yes again they all correspond to data entries, no: x60 features)

3) Do left arm and right arm have different x,y,z values? If yes, again calculate them separately. ( I guess they do, because left and right arm statistically tend to occupy different points in space)

4) x,y,z values definitely independent distributions, so calculate them separately.

Now when you decide you will have features between 3 and 360 (depending on your decisions) so calculate necessary values for them (max, min or mean, std) and perform standard routine.

Hope it helps!

Upvotes: 0

Related Questions