Jayne How
Jayne How

Reputation: 143

How to sum all values in each column and divide each column by the summed value

Basically I have a 10000x10000 matrix named M and there are 1s and 0s in every column. I'm trying to count the number of 1s in every column and then divide every element in that column with this number.

This is what I have tried:

outbound_links = M[M == 1].count()

mat = [[1] * 10000] * 10000
n = 10000
#len(mat)

# for each column
for col_index in range(0, n):

    # count the number of 1s
    for row_index in range(0, n):
      
      if M[row_index][col_index] == 1:
            mat[row_index][col_index] = 1 / outbound_links[col_index]
    else:
            mat[row_index][col_index] = 0

print(mat)

But the code is unable to run because it seems too big a matrix. I was wondering what other alternatives I could use?

Upvotes: 4

Views: 1480

Answers (3)

Tomerikoo
Tomerikoo

Reputation: 19414

None numpy way. Simply iterate all columns, for each find the amount of ones and then divide each cell with that count:

from random import randint

n = 4
mat = [[randint(0,1) for _ in range(n)] for _ in range(n)]

print(*mat, sep='\n')

for col in range(n):
    # count the number of 1s
    ones = sum(mat[row][col] for row in range(n))

    if ones:  # Avoid dividing by zero
        for row in range(n):
            mat[row][col] /= ones

print('\n', *mat, sep='\n')

An example run:

[1, 0, 0, 1]
[0, 1, 1, 0]
[0, 0, 0, 1]
[1, 1, 1, 1]


[0.5, 0.0, 0.0, 0.33]
[0.0, 0.5, 0.5, 0.0]
[0.0, 0.0, 0.0, 0.33]
[0.5, 0.5, 0.5, 0.33]

Upvotes: 0

NHL
NHL

Reputation: 287

You can try this:

import numpy as np

mat = np.array(M)
for i in range(len(mat[0])):
    try:
        mat[:,i] = mat[i,:]/np.sum(mat[:,i])
    except:
        print("no ones in that column")

Upvotes: -1

Péter Leéh
Péter Leéh

Reputation: 2119

As suggested in the comments, you should use numpy for this. I think this will do:

import numpy as np

m = np.random.randint(0, 2, (4, 4))

# array([[0, 1, 1, 0],
#        [0, 1, 0, 1],
#        [0, 1, 0, 1],
#        [1, 1, 1, 0]])

m / np.sum(m, axis=0)[np.newaxis, :]

# array([[0.  , 0.25, 0.5 , 0.  ],
#        [0.  , 0.25, 0.  , 0.5 ],
#        [0.  , 0.25, 0.  , 0.5 ],
#        [1.  , 0.25, 0.5 , 0.  ]])

Upvotes: 2

Related Questions