Normalization: how to avoid zero standard deviation

Question

Have the following task:

Normalize the matrix by columns. From each value in column subtract average (in column) and divide it by standard deviation (in the column). Your output should not contain nan (caused by division by zero). Replace Nans with 1. Don't use if and while/for.

I an working with numpy, so I wrote the following code:

def normalize(matrix: np.array) -> np.array:
    res = (matrix - np.mean(matrix, axis = 0)) / np.std(matrix, axis = 0, dtype=np.float64)
    return res
matrix = np.array([[1, 4, 4200], [0, 10, 5000], [1, 2, 1000]])
assert np.allclose(
    normalize(matrix),
    np.array([[ 0.7071, -0.39223,  0.46291],
              [-1.4142,  1.37281,  0.92582],
              [ 0.7071, -0.98058, -1.38873]])
)

The answer is right.

However, my question is: how do I avoid division by zero? If i have a column of similar numbers, I'll have standard deviation = 0 and the Nan value in result. How do I solve it? Would be grateful!

FlyingTeller · Accepted Answer

Your task specifies to avoid nan in the output and replace nan that occur with 1. It does not specify that intermediate results may not contain nan. A valid solution can be to use numpy.nan_to_num on res before returning:

import numpy as np
def normalize(matrix: np.array) -> np.array:
    res = (matrix - np.mean(matrix, axis = 0)) / np.std(matrix, axis = 0, dtype=np.float64)
    return np.nan_to_num(res, False, 1.0)
matrix = np.array([[2, 4, 4200], [2, 10, 5000], [2, 2, 1000]])
print(normalize(matrix))

yields:

[[ 1.         -0.39223227  0.46291005]
 [ 1.          1.37281295  0.9258201 ]
 [ 1.         -0.98058068 -1.38873015]]

Normalization: how to avoid zero standard deviation

Answers (1)

Related Questions