Reputation: 3165
I have a numpy matrix containing numbers.
1,0,1,1
0,1,1,1
0,0,1,0
1,1,1,1
I would like to perform a Z-Score Normalization over each column; z_Score[y] = (y-mean(column))/sqrt(var) y being each element in the column, mean being the mean function, sqrt the squared root function and var the variance.
My Approach was the following:
x_trainT = x_train.T #transpose the matrix to iterate over columns
for item in x_trainT:
m = item.mean()
var = np.sqrt(item.var())
item = (item - m)/var
x_train = x_trainT.T
I thought that upon iteration, each row is accessed by reference, (like in c# lists for instance), therefore allowing me to change the matrix values through changing row values.
However I was wrong, since the matrix keeps its original values intact.
Your help is appreciated.
Upvotes: 0
Views: 1164
Reputation: 2647
I'd recommend you to avoid iterations when possible. You can compute the mean and std in a 'column wise' manner.
>>> import numpy as np
>>> x_train = np.random.random((5, 8))
>>> norm_x_train = (x_train - x_train.mean(axis=0)) / x_train.std(axis=0)
Upvotes: 2
Reputation: 429
You'll likely have to index over row number:
x_trainT = x_train.T
for i in range(x_trainT.shape[0]):
item = x_trainT[i]
m = item.mean()
sd = np.sqrt(item.var())
x_trainT[i] = (item - m)/sd
x_trainT = x_train.T
Upvotes: 1