Python arrays dimension issues

Question

I am struggling once again with Python, NumPy and arrays to compute some calculations between matrices.

The code part that is likely not working properly is as follows:

train, test, cv = np.array_split(data, 3, axis = 0) 
train_inputs = train[:,: -1]
test_inputs = test[:,: -1]
cv_inputs = cv[:,: -1]

train_outputs = train[:, -1]
test_outputs = test[:, -1]
cv_outputs = cv[:, -1]

When printing those matrices informations (np.ndim, np.shape and dtype respectively), this is what you get:

2
1
2
1
2
1
(94936, 30)
(94936,)
(94936, 30)
(94936,)
(94935, 30)
(94935,)
float64
float64
float64
float64
float64
float64

I believe it is missing 1 dimension in all *_output arrays.

The other matrix I need is created by this command:

newMatrix = neuronLayer(30, 94936)

In which neuronLayer is a class defined as:

class neuronLayer():
    def __init__(self, neurons, neuron_inputs):
        self.weights = 2 * np.random.random((neuron_inputs, neurons)) - 1

Here's the final output:

outputLayer1 = self.__sigmoid(np.dot(inputs, self.layer1.weights))
ValueError: shapes (94936,30) and (94936,30) not aligned: 30 (dim 1) != 94936 (dim 0)

Python is clearly telling me the matrices are not adding up but I am not understanding where is the problem.

Any tips?

PS: The full code is pasted ħere.

hpaulj · Accepted Answer

layer1 = neuronLayer(30, 94936)    # 29 neurons with 227908 inputs
layer2 = neuronLayer(1, 30)         # 1 Neuron with the previous 29 inputs

where `nueronLayer creates

self.weights = 2 * np.random.random((neuron_inputs, neurons)) - 1

the 2 weights are (94936,30) and (30,1) in size.

This line does not make any sense. I surprised it doesn't give an error

layer1error = layer2delta.dot(self.layer2.weights.np.transpose)

I suspect you want np.transpose(self.layer2.weights) or self.layer2.weights.T.

But maybe it doesn't get there. train first calls think with a (94936,30) inputs

    outputLayer1 = self.__sigmoid(np.dot(inputs, self.layer1.weights))
    outputLayer2 = self.__sigmoid(np.dot(outputLayer1, self.layer2.weights))

So it tries to do a np.dot with 2 (94936,30), (94936,30) arrays. They aren't compatible for a dot. You could transpose one or the other, producing either (94936,94936) array or (30,30). One looks too big. The (30,30) is compatible with the weights for the 2nd layer.

np.dot(inputs.T, self.layer1.weights)

has a chance of working right.

np.dot(outputLayer1, self.layer2.weights)
(30,30) with (30,1) => (30,1)

But then you do

train_outputs - outputLayer2

That will have problems regardless of whether train_outputs is (94936,) or (94936,1)

You need to make sure that arrays shapes flow correctly through the calculation. Don't just check them at the start. Check then internally. And make you sure you understand what shapes they should have at each step.

It would be a whole lot easier to develop and test this code with much smaller inputs and layers, something like 10 samples and 3 features. That way you can look at the values as well as the shapes.

Python arrays dimension issues

Answers (2)

Related Questions