Tarun Kumar
Tarun Kumar

Reputation: 875

Sum along axis in numpy array

I want to understand how this ndarray.sum(axis=) works. I know that axis=0 is for columns and axis=1 is for rows. But in case of 3 dimensions(3 axes) its difficult to interpret below result.

arr = np.arange(0,30).reshape(2,3,5)

arr
Out[1]: 
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14]],

       [[15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29]]])

arr.sum(axis=0)
Out[2]: 
array([[15, 17, 19, 21, 23],
       [25, 27, 29, 31, 33],
       [35, 37, 39, 41, 43]])


arr.sum(axis=1)
Out[8]: 
array([[15, 18, 21, 24, 27],
       [60, 63, 66, 69, 72]])

arr.sum(axis=2)
Out[3]: 
array([[ 10,  35,  60],
       [ 85, 110, 135]])

Here in this example of 3 axes array of shape(2,3,5), there are 3 rows and 5 columns. But if i look at this array as whole, seems like only two rows (both with 3 array elements).

Can anyone please explain how this sum works on array of 3 or more axes(dimensions).

Upvotes: 16

Views: 21872

Answers (7)

hpaulj
hpaulj

Reputation: 231335

numpy displays a (2,3,5) array as 2 blocks of 3x5 arrays (3 rows, 5 columns). Or call them 'planes' (MATLAB would show it as 5 blocks of 2x3).

The numpy display also matches a nested list - a list of two sublists; each with 3 sublists. Each of those is 5 elements long.

In the 3x5 2d case, axis 0 sums along the size 3 dimension, resulting in a 5 element array. The descriptions 'sum over rows' or 'sum along colulmns' are a little vague in English. Focus on the results, the change in shape, and which values are being summed, not on the description.

Back to the 3d case:

With axis=0, it sums along the 1st dimension, effectively removing it, leaving us with a 3x5 array. 0+15=16, 1+16=17 etc.

Axis 1, condenses the size 3 dimension, result is 2x5. 0+5+10=15, etc.

Axis 2, condense the size 5 dimenson, result is 2x3, sum((0,1,2,3,4))

Your example is good, since the 3 dimensions are different, and it is easier to see which one was eliminated during the sum.

With 2d there's some ambiguity; 'sum over rows' - does that mean the rows are eliminated or retained? With 3d there's no ambiguity; with axis=0, you can only remove it, leaving the other 2.

Upvotes: 3

It's maybe a little easier to see with a simpler 3D array. After filling the array with ones, the numbers in the sums come out to be the size of the particular dimension summed over! The other two dimensions in each case are left intact.

arr = np.arange(0,60).reshape(4,3,5)
arr
Out[10]: 
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14]],

       [[15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29]],

       [[30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44]],

       [[45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]]])

arr=arr*0+1

arr
Out[12]: 
array([[[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]],

       [[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]],

       [[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]],

       [[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]]])

arr0=arr.sum(axis=0,keepdims=True)
arr2=arr.sum(axis=2,keepdims=True)
arr1=arr.sum(axis=1,keepdims=True)

arr0
Out[20]: 
array([[[4, 4, 4, 4, 4],
        [4, 4, 4, 4, 4],
        [4, 4, 4, 4, 4]]])

arr1
Out[21]: 
array([[[3, 3, 3, 3, 3]],

       [[3, 3, 3, 3, 3]],

       [[3, 3, 3, 3, 3]],

       [[3, 3, 3, 3, 3]]])

arr2
Out[22]: 
array([[[5],
        [5],
        [5]],

       [[5],
        [5],
        [5]],

       [[5],
        [5],
        [5]],

       [[5],
        [5],
        [5]]])

Upvotes: 0

Ananth Raghuraman
Ananth Raghuraman

Reputation: 89

Think of a multi-dimensional array as a tree. Each dimension is a level in the tree. Each grouping at that level is a node. A sum along a specific axis (say axis=4) means coalescing (overlaying) all nodes at that level into a single node (under their respective parents). Sub-trees rooted at the overlaid nodes at that level are stacked on top of each other. All overlapping nodes' values are added together.
Picture: https://ibb.co/dg3P3w

Upvotes: 0

Daniel F
Daniel F

Reputation: 14399

You seem to be confused by the output style of numpy arrays. The "row" of the output is almost always the last index, not the first. Example:

x=np.arange(1,4)
y=np.arange(10,31,10)
z=np.arange(100,301,100)
xy=x[:,None]+y[None,:]

xy
Out[100]: 
array([[11, 21, 31],
       [12, 22, 32],
       [13, 23, 33]])

Notice the tens place increments on the row, not the column, even though y is the second index.

xyz=x[:,None,None]+y[None,:,None]+z[None,None,:]
xyz
Out[102]: 
array([[[111, 211, 311],
        [121, 221, 321],
        [131, 231, 331]],

       [[112, 212, 312],
        [122, 222, 322],
        [132, 232, 332]],

       [[113, 213, 313],
        [123, 223, 323],
        [133, 233, 333]]])

Now the hundred's place increments in the row, even though z is the last index. This can be somewhat counter-intuitive to beginners.

Thus when you do np.sum(x,index=-1) you will always sum over the "rows" as shown in the np.array([]) format. Looking at the arr.sum(axis=2)[0,0] that's 0+1+2+3+4=10.

Upvotes: 0

akuiper
akuiper

Reputation: 214927

Here is another way to interpret this. You can consider a multi-dimensional array as a tensor, T[i][j][k], while i, j, k represents axis 0,1,2 respectively.

T.sum(axis = 0) mathematically will be equivalent to:

enter image description here

Similary, T.sum(axis = 1):

enter image description here

And, T.sum(axis = 2):

enter image description here

So in another word, the axis will be summed over, for instance, axis = 0, the first index will be summed over. If written in a for loop:

result[j][k] = sum(T[i][j][k] for i in range(T.shape[0])) for all j,k

for axis = 1:

result[i][k] = sum(T[i][j][k] for j in range(T.shape[1])) for all i,k

etc.

Upvotes: 5

MSeifert
MSeifert

Reputation: 152587

If you want to keep the dimensions you can specify keepdims:

>>> arr = np.arange(0,30).reshape(2,3,5)
>>> arr.sum(axis=0, keepdims=True)
array([[[15, 17, 19, 21, 23],
        [25, 27, 29, 31, 33],
        [35, 37, 39, 41, 43]]])

Otherwise the axis you sum along is removed from the shape. An easy way to keep track of this is using the numpy.ndarray.shape property:

>>> arr.shape
(2, 3, 5)

>>> arr.sum(axis=0).shape
(3, 5)  # the first entry (index = axis = 0) dimension was removed 

>>> arr.sum(axis=1).shape
(2, 5)  # the second entry (index = axis = 1) was removed

You can also sum along multiple axis if you want (reducing the dimensionality by the amount of specified axis):

>>> arr.sum(axis=(0, 1))
array([75, 81, 87, 93, 99])
>>> arr.sum(axis=(0, 1)).shape
(5, )  # first and second entry is removed

Upvotes: 10

John Zwinck
John Zwinck

Reputation: 249123

The axis you specify is the one that is effectively removed. So given a shape of (2,3,5), axis 0 gives (3,5), axis 1 gives (2,5), etc. This extends to any number of dimensions.

Upvotes: 0

Related Questions