statistics.mean() vs sum()/len() vs np.average() for a list of lists

Question

Data: A list of equal-sized lists that have to be averaged along columns to return one averaged list

Is it faster to average the above-mentioned data in python using one of either statistics.mean() or sum()/len() or is it faster to convert it to a numpy array and then use np.average()?

Or is there no significant difference?

This question provides an answer to which method to use, but does not mention any comparison with alternatives.

Dani Mesejo · Accepted Answer

You can measure the performance of the different proposals. I am assuming than along the columns means that is row-wise. For instance if you have 1000 lists of 100 elements each at the end you are going to have a list with 100 averages.

import random
import numpy as np
import statistics
import timeit

data = [[random.random() for _ in range(100)] for _ in range(1000)]


def average(data):
    return np.average(data, axis=0)


def sum_len(data):
    return [sum(l) / len(l) for l in zip(*data)]


def mean(data):
    return [statistics.mean(l) for l in zip(*data)]


if __name__ == "__main__":
    print(timeit.timeit('average(data)', 'from __main__ import data,average', number=10))
    print(timeit.timeit('sum_len(data)', 'from __main__ import data,sum_len', number=10))
    print(timeit.timeit('mean(data)', 'from __main__ import data,mean', number=10))

Output

0.025441123012569733
0.029354612997849472
1.0484535950090503

It appears that statistics.mean is considerable slower (about 35 times slower) than np.average and the sum_len method and than np.average is marginally faster than sum_len.

statistics.mean() vs sum()/len() vs np.average() for a list of lists

Answers (2)

Related Questions