Reputation: 367
input:
[
[1,2,3,4,5],
[5,4,3,2,1],
[3,3,3,3,3]
]
output:
[3,3,3,3,3]
brute force solution:
def calculate_col_mean(matrix):
mean = []
num_row = len(matrix)
num_col = len(matrix[0])
result = [0] * num_col
for i in range(num_row):
for j in range(num_col):
result[j] += matrix[i][j]
for i in range(num_col):
result[i] = result[i] / num_row
return result
This works for small datasets. Imagine if our dataset is really big (1GB+++), how I can optimize this? Threading? How would I go about that?
PS: it took about 2 hours + running on 1GB data with the brute force approach.
Upvotes: 0
Views: 253
Reputation: 1890
Highly advise using NumPy for something like this.
Go to your command line and activate your python environment.
pip install numpy
Open up Python at the command line or using Jupyter Notebooks (preferred)
import numpy as np
your_array = np.array([
[1,2,3,4,5],
[5,4,3,2,1],
[3,3,3,3,3]
])
0 in the mean(#) indicates the axis you'd like the mean performed on
column_averages = your_array.mean(0)
print(column_averages)
[3,3,3,3,3]
Upvotes: 0
Reputation: 1996
import numpy as np
a = np.array([[1,2,3,4,5],[5,4,3,2,1],[3,3,3,3,3]])
column_mean = a.mean(axis=0)
Upvotes: 1