Hanming Zeng
Hanming Zeng

Reputation: 367

calculate column mean of a matrix - how to optimize?

input:

[
[1,2,3,4,5],
[5,4,3,2,1],
[3,3,3,3,3]
]

output:

[3,3,3,3,3]

brute force solution:

def calculate_col_mean(matrix):
   mean = []
   num_row = len(matrix)
   num_col = len(matrix[0])
   result = [0] * num_col
   for i in range(num_row):
      for j in range(num_col):
          result[j] += matrix[i][j]

   for i in range(num_col):
       result[i] = result[i] / num_row

   return result

This works for small datasets. Imagine if our dataset is really big (1GB+++), how I can optimize this? Threading? How would I go about that?

PS: it took about 2 hours + running on 1GB data with the brute force approach.

Upvotes: 0

Views: 253

Answers (2)

NanoBennett
NanoBennett

Reputation: 1890

Highly advise using NumPy for something like this.

Go to your command line and activate your python environment.

pip install numpy

Open up Python at the command line or using Jupyter Notebooks (preferred)

import numpy as np
your_array = np.array([
[1,2,3,4,5],
[5,4,3,2,1],
[3,3,3,3,3]
])

0 in the mean(#) indicates the axis you'd like the mean performed on

column_averages = your_array.mean(0)

print(column_averages)
[3,3,3,3,3]

Upvotes: 0

sunnytown
sunnytown

Reputation: 1996

import numpy as np
a = np.array([[1,2,3,4,5],[5,4,3,2,1],[3,3,3,3,3]])
column_mean = a.mean(axis=0)

Upvotes: 1

Related Questions