Reputation: 7910
If I have a 2D numpy array as follows:
[[1., 0., 1.]
[1., 0., 2.]
[2., 0., 1.]]
I'd like to normalise all columns to sum to 1, resulting in:
[[0.25, 0.33, 0.25]
[0.25, 0.33, 0.50]
[0.50, 0.33, 0.25]]
Note that in the case where the sum of a column is 0, I'd like them to be equally distributed as you see above. It's essentially just scaling but with a special case.
If I was guaranteed that all columns would add up to be > 0
then I could just do:
>>> x = np.array([[1,2,3],[4,5,6],[7,8,9]]) * 1.0
>>> x
array([[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.]])
>>> x / np.sum(x, axis=0)
array([[ 0.08333333, 0.13333333, 0.16666667],
[ 0.33333333, 0.33333333, 0.33333333],
[ 0.58333333, 0.53333333, 0.5 ]])
But that fails for the first example because you get a /0
error.
It would be ideal if there was a general solution which could be extended for a third dimension also. The example run above works exactly the same for 3D arrays, but still fails on the zero case.
>>> x = np.array([[[1,2,3],[4,5,6],[7,8,9]], [[10, 11, 12], [13, 14, 15], [16, 17, 18]]]) * 1.0
>>> x
array([[[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.]],
[[ 10., 11., 12.],
[ 13., 14., 15.],
[ 16., 17., 18.]]])
>>> x / np.sum(x, axis=0)
array([[[ 0.09090909, 0.15384615, 0.2 ],
[ 0.23529412, 0.26315789, 0.28571429],
[ 0.30434783, 0.32 , 0.33333333]],
[[ 0.90909091, 0.84615385, 0.8 ],
[ 0.76470588, 0.73684211, 0.71428571],
[ 0.69565217, 0.68 , 0.66666667]]])
Upvotes: 1
Views: 134
Reputation: 363567
Just set the all-zero columns to an arbitrary non-zero value, then proceed as before:
>>> x = np.array([[1., 0., 1.],
... [1., 0., 2.],
... [2., 0., 1.]])
>>> x[:, np.all(x == 0, axis=0)] = 1
>>> x / np.sum(x, axis=0)
array([[ 0.25 , 0.33333333, 0.25 ],
[ 0.25 , 0.33333333, 0.5 ],
[ 0.5 , 0.33333333, 0.25 ]])
Upvotes: 6