Reputation: 5923
I'm trying to understand why the accuracy of my algorithm has suddenly changed quite dramatically. One small change I did what that I added a forth :
when I discovered that I was using only 3 indexes when standardizing my 4-dimensional train/test set. And now I'm curious - would the below old/new code do the same? If not, how does indexing into a 4-dimensional array using only 3 indexes work?
Old:
# standardize all non-binary variables
channels = 14 # int(X.shape[1])
mu_f = np.zeros(shape=channels)
sigma_f = np.zeros(shape=channels)
for i in range(channels):
mu_f[i] = np.mean(X_train[:,i,:])
sigma_f[i] = np.std(X_train[:,i,:])
for i in range(channels):
X_train[:, i, :] -= mu_f[i]
X_test[:, i, :] -= mu_f[i]
if (sigma_f[i] != 0):
X_train[:, i, :] /= sigma_f[i]
X_test[:, i, :] /= sigma_f[i]
New:
# standardize all non-binary variables
channels = 14
mu_f = np.zeros(shape=channels)
sigma_f = np.zeros(shape=channels)
for i in range(channels):
mu_f[i] = np.mean(X_train[:,i,:,:])
sigma_f[i] = np.std(X_train[:,i,:,:])
for i in range(channels):
X_train[:, i, :, :] -= mu_f[i]
X_test[:, i, :, :] -= mu_f[i]
if (sigma_f[i] != 0):
X_train[:, i, :, :] /= sigma_f[i]
X_test[:, i, :, :] /= sigma_f[i]
Upvotes: 1
Views: 50
Reputation: 231385
I don't see why the extra :
makes a difference. It doesn't when I do time tests on a simple np.mean(X[:,1])
v np.mean(X,1,:,:]
, etc.
As for plonser's
suggestion that you can vectorize the whole thing, the key is realizing that mean
and std
take some added parameters. Check their docs and play around with sample arrays.
Xmean = np.mean(X,axis=(0,2,3),keepdims=True)
X -= Xmean
X /= Xmean
Upvotes: 2