Cat
Cat

Reputation: 13

Shapes of the np.arrays, unexpected additional dimension

I'm dealing with arrays in python, and this generated a lot of doubts...

1) I produce a list of list reading 4 columns from N files and I store 4 elements for N times in a list. I then convert this list in a numpy array:

s = np.array(s)

and I ask for the shape of this array. The answer is correct:

print s.shape
#(N,4)

I then produce the mean of this Nx4 array:

s_m = sum(s)/len(s)
print s_m.shape
#(4,)

that I guess it means that this array is a 1D array. Is this correct?

2) If I subtract the mean vector s_m from the rows of the array s, I can proceed in two ways:

residuals_s = s - s_m

or:

residuals_s = []

for i in range(len(s)):
    residuals_s.append([])
    tmp = s[i] - s_m
    residuals_s.append(tmp)

if I now ask for the shape of residuals_s in the two cases I obtain two different answers. In the first case I obtain:

(N,4)

in the second:

(N,1,4)

can someone explain why there is an additional dimension?

Upvotes: 0

Views: 294

Answers (2)

hpaulj
hpaulj

Reputation: 231335

You can get the mean using the numpy method (producing the same (4,) shape):

s_m = s.mean(axis=0)

s - s_m works because s_m is 'broadcasted' to the dimensions of s.

If I run your second residuals_s I get a list containing empty lists and arrays:

[[],
 array([ 1.02649662,  0.43613824,  0.66276758,  2.0082684 ]),
 [],
 array([ 1.13000227, -0.94129685,  0.63411801, -0.383982  ]),
 ...
]

That does not convert to a (N,1,4) array, but rather a (M,) array with dtype=object. Did you copy and paste correctly?

A corrected iteration is:

for i in range(len(s)):
    residuals_s.append(s[i]-s_m)

produces a simpler list of arrays:

[array([ 1.02649662,  0.43613824,  0.66276758,  2.0082684 ]),
 array([ 1.13000227, -0.94129685,  0.63411801, -0.383982  ]),
...]

which converts to a (N,4) array.

Iteration like this usually is not needed. But if it is, appending to lists like this is one way to go. Another is to pre allocate an array, and assign rows

residuals_s = np.zeros_like(s)
for i in range(s.shape[0]):
    residuals_s[i,:] = s[i]-s_m

I get your (N,1,4) with:

In [39]: residuals_s=[]
In [40]: for i in range(len(s)):
   ....:     residuals_s.append([])
   ....:     tmp = s[i] - s_m
   ....:     residuals_s[-1].append(tmp)
In [41]: residuals_s
Out[41]: 
[[array([ 1.02649662,  0.43613824,  0.66276758,  2.0082684 ])],
 [array([ 1.13000227, -0.94129685,  0.63411801, -0.383982  ])],
...]
In [43]: np.array(residuals_s).shape
Out[43]: (10, 1, 4)

Here the s[i]-s_m array is appended to an empty list, which has been appended to the main list. So it's an array within a list within a list. It's this intermediate list that produces the middle 1 dimension.

Upvotes: 1

HYRY
HYRY

Reputation: 97261

You are using NumPy ndarray without using the functions in NumPy, sum() is a python builtin function, you should use numpy.sum() instead.

I suggest you change your code as:

import numpy as np
np.random.seed(0)
s = np.random.randn(10, 4)
s_m = np.mean(a, axis=0, keepdims=True)
residuals_s = s - s_m

print s.shape, s_m.shape, residuals_s.shape

use mean() function with axis and keepdims arguments will give you the correct result.

Upvotes: 0

Related Questions