Pete Hamilton
Pete Hamilton

Reputation: 7910

Normalising columns in numpy

Problem:

If I have a 2D numpy array as follows:

[[1., 0., 1.]
 [1., 0., 2.]
 [2., 0., 1.]]

I'd like to normalise all columns to sum to 1, resulting in:

[[0.25, 0.33, 0.25]
 [0.25, 0.33, 0.50]
 [0.50, 0.33, 0.25]]

Note that in the case where the sum of a column is 0, I'd like them to be equally distributed as you see above. It's essentially just scaling but with a special case.

Existing Attempt

If I was guaranteed that all columns would add up to be > 0 then I could just do:

>>> x = np.array([[1,2,3],[4,5,6],[7,8,9]]) * 1.0

>>> x
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.],
       [ 7.,  8.,  9.]])

>>> x / np.sum(x, axis=0)
array([[ 0.08333333,  0.13333333,  0.16666667],
       [ 0.33333333,  0.33333333,  0.33333333],
       [ 0.58333333,  0.53333333,  0.5       ]])

But that fails for the first example because you get a /0 error.

Ideal Solution

It would be ideal if there was a general solution which could be extended for a third dimension also. The example run above works exactly the same for 3D arrays, but still fails on the zero case.

>>> x = np.array([[[1,2,3],[4,5,6],[7,8,9]], [[10, 11, 12], [13, 14, 15], [16, 17, 18]]]) * 1.0

>>> x
array([[[  1.,   2.,   3.],
        [  4.,   5.,   6.],
        [  7.,   8.,   9.]],

       [[ 10.,  11.,  12.],
        [ 13.,  14.,  15.],
        [ 16.,  17.,  18.]]])

>>> x / np.sum(x, axis=0)
array([[[ 0.09090909,  0.15384615,  0.2       ],
        [ 0.23529412,  0.26315789,  0.28571429],
        [ 0.30434783,  0.32      ,  0.33333333]],

       [[ 0.90909091,  0.84615385,  0.8       ],
        [ 0.76470588,  0.73684211,  0.71428571],
        [ 0.69565217,  0.68      ,  0.66666667]]])

Upvotes: 1

Views: 134

Answers (1)

Fred Foo
Fred Foo

Reputation: 363567

Just set the all-zero columns to an arbitrary non-zero value, then proceed as before:

>>> x = np.array([[1., 0., 1.],
...               [1., 0., 2.],
...               [2., 0., 1.]])
>>> x[:, np.all(x == 0, axis=0)] = 1
>>> x / np.sum(x, axis=0)
array([[ 0.25      ,  0.33333333,  0.25      ],
       [ 0.25      ,  0.33333333,  0.5       ],
       [ 0.5       ,  0.33333333,  0.25      ]])

Upvotes: 6

Related Questions