Reputation: 357
Have the following task:
Normalize the matrix by columns. From each value in column subtract average (in column) and divide it by standard deviation (in the column). Your output should not contain nan (caused by division by zero). Replace Nans with 1. Don't use if and while/for.
I an working with numpy, so I wrote the following code:
def normalize(matrix: np.array) -> np.array:
res = (matrix - np.mean(matrix, axis = 0)) / np.std(matrix, axis = 0, dtype=np.float64)
return res
matrix = np.array([[1, 4, 4200], [0, 10, 5000], [1, 2, 1000]])
assert np.allclose(
normalize(matrix),
np.array([[ 0.7071, -0.39223, 0.46291],
[-1.4142, 1.37281, 0.92582],
[ 0.7071, -0.98058, -1.38873]])
)
The answer is right.
However, my question is: how do I avoid division by zero? If i have a column of similar numbers, I'll have standard deviation = 0 and the Nan value in result. How do I solve it? Would be grateful!
Upvotes: 1
Views: 1587
Reputation: 20492
Your task specifies to avoid nan
in the output and replace nan
that occur with 1. It does not specify that intermediate results may not contain nan.
A valid solution can be to use numpy.nan_to_num
on res
before returning:
import numpy as np
def normalize(matrix: np.array) -> np.array:
res = (matrix - np.mean(matrix, axis = 0)) / np.std(matrix, axis = 0, dtype=np.float64)
return np.nan_to_num(res, False, 1.0)
matrix = np.array([[2, 4, 4200], [2, 10, 5000], [2, 2, 1000]])
print(normalize(matrix))
yields:
[[ 1. -0.39223227 0.46291005]
[ 1. 1.37281295 0.9258201 ]
[ 1. -0.98058068 -1.38873015]]
Upvotes: 1