Avizipi
Avizipi

Reputation: 498

performance of average on python list. sum(l)/len(l) vs numpy.mean(l)

Was fooling around about what is the best way to calculate a mean of a list in python. Although I thought that numpy as optimized My results show that you shouldn't use numpy when it comes to this. I was wondering why and how python achieve this performance.

So basically I am trying to figure out how come native python is faster than numpy.

My code for testing:

import random
import numpy as np
import timeit

def average_native(l):
    return sum(l)/len(l)


def average_np(l):
    return np.mean(l)


def test_time(func, arg):
    starttime = timeit.default_timer()
    for _ in range(500):
        func(arg)
    return (timeit.default_timer() - starttime) / 500


for i in range(1, 7):
    numbers = []
    for _ in range(10**i):
        numbers.append(random.randint(0, 100))
    print("for " + str(10**i) + " numbers:")
    print(test_time(average_native, numbers))
    print(test_time(average_np, numbers))

The results:

for 10 numbers:
2.489999999999992e-07
8.465800000000023e-06
for 100 numbers:
8.554000000000061e-07
1.3220000000000009e-05
for 1000 numbers:
7.2817999999999495e-06
6.22666e-05
for 10000 numbers:
6.750499999999993e-05
0.0005553966000000001
for 100000 numbers:
0.0006954238
0.005352444999999999
for 1000000 numbers:
0.007034196399999999
0.0568878216

BTW I was running same code in c++ and was surprised to see that the python code is faster. test code:

#include <iostream>
#include <cstdlib>
#include <vector>
#include <chrono>

float calculate_average(std::vector<int> vec_of_num)
{
    double sum=0;
    uint64_t cnt=0;
    for(auto & elem : vec_of_num)
    {
        cnt++;
        sum = sum + elem;     
    }
    return sum / cnt;
}
int main()
{
    // This program will create same sequence of
    // random numbers on every program run
    std::vector<int> vec;
    for(int i = 0; i < 1000000; i++)
       vec.push_back(rand());
    auto start = std::chrono::high_resolution_clock::now();
    for(int i = 0; i < 500; i++)
       calculate_average(vec);
    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double, std::milli> float_ms = end - start;

    std::cout << "calculate_average() elapsed time is " << float_ms.count()/500 << " ms )" << std::endl;
    return 0;
}

results:

calculate_average() elapsed time is 11.2082 ms )

Am I missing something?

Edit: I was running the c++ code on an online compiler (probebly without any optimization). Also it isn't the same Hardware, and how know what is going on it that server. After running and compiling the code in my device the code is much faster.

Edit2: So I changed the code for a numpy array in the numpy function and we do see that for smaller array/list the native python is better, however after around 1000 values numpy is preforming better. I don't really understand why. which optimizations numpy have that produce these results?

new results:

for 10 numbers:
2.4540000000000674e-07
6.722200000000012e-06
for 100 numbers:
8.497999999999562e-07
6.583400000000017e-06
for 1000 numbers:
6.990799999999964e-06
7.916000000000034e-06
for 10000 numbers:
6.61604e-05
1.5475799999999985e-05
for 100000 numbers:
0.0006671193999999999
8.412259999999994e-05
for 1000000 numbers:
0.0068192092
0.0008199298000000005

Maybe I need to restart this question :)

Upvotes: 1

Views: 998

Answers (4)

aereniil
aereniil

Reputation: 1

My code is at the end.

Conclusion/answer to the question

From the tests results below, the Numpy methods (at least for computing the mean) are definitely faster than the Python built-in methods, as long as you are working with big enough lists or arrays.

However, the Python built-in method I tested can outperform the Numpy one for very short (1D-)lists or (1D-)arrays. Here it happens for lists/arrays containing less than ~150 numbers : so you must be aware of it!

@norok2 's above answer explains this behavior.

To no surprise, you can also see that the relative performances between each method remain consistent, no matter how many times the functions are called.

Method

As I went through a tutorial about Numpy, I read that with increasing size, Numpy arrays manipulations can be up to 30 times faster than the same operation with built-in Python list operations.

So, out of curiosity, I made a script which measure the execution time of 2 fonctions (through a decorator) :

  1. The Numpy method for computing (here 1D-)array mean:
    def mean_array(array):
      return np.mean(array)
  1. The way to compute the mean with Python built-in functions:
    def mean_list(liste):
     return sum(liste)/len(liste)

Randomness, reliability

With the random module, I generated maxnsize lists income. Each list income[i] contains nsize[i] randomly generated numbers between 1400 and 10000 (I figured a case with the incomes from nsize[i] financial operations in a year).

Note that maxnsize is the number of times each function (time) test were repeated. This gave me a sample of maxnsize measurements of the execution time of the above functions, so I could infer a statistical mean and standard deviation to enforce the reliability of my tests.

To further improve the robustness of my conclusions and enforce a stress test of comparing the functions at different list sizes (hence scattering the time execution scales), I also randomized each nsize[i] to be between 1 and a parameter nmaxoperations.

Thus, before we proceed to measure the functions performances : all the incomes, the lists containing them, and the size of each list are randomly generated using numpy.randint.

I tested those parameters :

  1. maxnsize = 1e3,1e4,1e5
  2. For each maxnsize, I tested nmaxoperations = 10,1e2,..,1e6,1e7

I note x the time ratio of the Python method over the Numpy one.

Results

  • maxnsize = 1000
    • nmaxoperations=
      • 10 : x=0.2723±0.0654
      • 1e2 : x=0.7056±0.3035
      • 1e3 : x=4.4767±2.6933
      • 1e4 : x=23.670±11.838
      • 1e5 : x=42.059±14.877
  • maxnsize = 10000
    • nmaxoperations=
      • 10 : x=0.2290±0.0854
      • 1e2 : x=0.7654±0.3593
      • 1e3 : x=4.6912±2.9002
      • 1e4 : x=23.052±11.884
      • 1e5 : x=44.127±14.886
  • maxnsize = 100000
    • nmaxoperations=
      • 10 : x=0.2854±0.1108
      • 1e2 : x=0.7566±0.3488
      • 1e3 : x=4.6768±3.0187
      • 1e4 : x=23.835±11.870
      • 1e5 : x=43.780±14.326

Code

import numpy as np
import time

def measure_time(fonction):

    def modified_function(element):

        time_start = time.time() 
        returned = fonction(element) 
        time_end    = time.time()
        time_execution = time_end - time_start
        return time_execution, returned
    
    return modified_function



@measure_time
def mean_array(array):
    return np.mean(array)

@measure_time
def mean_liste(liste):
    return sum(liste)/len(liste)

maxnsize = 100
nmaxoperations = 1000  
nsize = np.random.randint(1,high=nmaxoperations,size=maxnsize)
time_liste = []
time_array = []
ratio_performances = []
means_liste = []
means_array = []
for i in range(0,maxnsize):
    income = np.random.randint(1400,high=10000,size=nsize[i]) 
    perf_array = tuple(mean_array(income)) 
    perf_liste = tuple(mean_liste(income)) 
    time_liste.append(perf_liste[0])
    means_liste.append(perf_liste[1])
    time_array.append(perf_array[0])
    means_array.append(perf_array[1])
    ratio_performances.append(time_liste[i]/time_array[i]) 


time_mean_liste = np.mean(time_liste)
time_mean_array = np.mean(time_array)
time_std_liste = np.std(time_liste)
time_std_array = np.std(time_array)
mean_income_liste = np.mean(means_liste)
mean_income_array = np.mean(means_array)
mean_std_liste = np.std(means_liste)
mean_std_array = np.std(means_array)
ratio_performances_mean = np.mean(ratio_performances)
ratio_performances_std = np.std(ratio_performances)

print(u"With Python\'s list : mean = {}\u00B1{} computed within {}\u00B1{} s".format(mean_income_liste,
                                                                                    mean_std_liste,
                                                                                    time_mean_liste,
                                                                                    time_std_liste))

print(u"With Numpy\'s array : mean = {}\u00B1{} computed within {}\u00B1{} s".format(mean_income_array,
                                                                                    mean_std_array,
                                                                                    time_mean_array,
                                                                                    time_std_array))

print(u"Numy took {}\u00B1{} less time to compute mean thant with Python built-in methods".format(ratio_performances_mean,ratio_performances_std))

Upvotes: 0

norok2
norok2

Reputation: 26906

The function numpy.mean() is doing a lot more than what sum() and len() is doing, that is why it is so "slow".

The kind of functionalities included in np.mean() is essentially what it makes it a ufunc, and especially the support for n-dimensional arrays.

However, the largest contributor to the speed difference between the naïve implementation and np.mean() is actually converting the list to a NumPy array.

Consider the following ways to compute the average:

  • this is essentially what you think it is super-fast
def mean_naive(seq):
    return sum(seq) / len(seq)
  • this is a numeric-safe implementation that is present in the standard Python library
import statistics


def mean_st(seq):
    return statistics.mean(seq)
  • this uses the NumPy mean() function:
import numpy as np


def mean_np(seq):
    return np.mean(seq)
  • this is the same as the naïve approach but a conversion to a NumPy array is performed to factor out the NumPy array conversion cost:
import numpy as np


def mean_naive_conv(seq):
    np.array(seq)  # the result of the conversion is not used!
    return sum(seq) / len(seq)
  • this is a Numba-accelerated version of the naïve approach acting on NumPy arrays. The Numba acceleration essentially converts the Python code to optimized C++ code via just-in-time compilation with llvm. If sum() / len() is faster-than-C, then mean_naive_conv() should outperform this one.
import numpy as np
import numba as nb


@nb.njit
def mean_naive_nb(seq):
    sum_ = 0
    for x in seq:
        sum_ += x
    return sum_ / len(seq)


def mean_naive_np_nb(seq):
    seq = np.array(seq)
    return mean_naive_nb(seq)

However, when we benchmarks these with the following code:

import random


funcs = (
    mean_naive, mean_st, mean_np, mean_naive_conv, mean_naive_np_nb, only_conv)

timings = {}
for k in range(1, 20):
    n = 2 ** k
    seq = tuple(random.random() for _ in range(n))
    print(f"n = {n}, k = {k}")
    timings[n] = []
    base = funcs[0](seq)
    for func in funcs:
        res = func(seq)  # this ensures that JIT-ted code is compiled before benchmarking
        is_good = np.allclose(base, res)
        timed = %timeit -r 4 -n 8 -q -o func(seq)
        timing = timed.best * 1e6
        timings[n].append(timing)
        print(f"{func.__name__:>24}  {is_good!s:>5}  {timing:10.3f} µs")

to be plotted with:

import pandas as pd


df = pd.DataFrame(data=timings, index=[func.__name__ for func in funcs]).transpose()
df.plot(marker='o', xlabel='Input size / #', ylabel='Best timing / µs', ylim=[0, 40000])

fig = plt.gcf()
fig.patch.set_facecolor('white')

bm

and with:

df.plot(marker='o', xlabel='Input size / #', ylabel='Best timing / µs', ylim=[0, 600], xlim=[0, 9000])

fig = plt.gcf()
fig.patch.set_facecolor('white')

enter image description here

(for some zooming on the smaller input sizes)

we can observe:

  • The statistics-based approach is the slowest by far and large
  • The naïve approach is the fastest by far and large
  • When comparing the all the methods that do have a type conversion from Python list to NumPy array:
    • np.mean() is the fastest for larger input sizes, likely because it is compiled with specific optimizations (I'd speculate making optimal use of SIMD instructions); for smaller inputs, the running time is dominated by the overhead for supporting all ufunc functionalities
    • the Numba-accelerated version is the fastest for medium input sizes; for very small inputs, the running time is lengthened by the small, roughly constant, overhead of calling a Numba function
    • for very small inputs the sum() / len() eventually gets to be the fastest

This indicates that sum() / len() is essentially slower than optimized C++ code acting on arrays.

Upvotes: 2

Pepijn Kramer
Pepijn Kramer

Reputation: 13076

You are copying the array for each call to average which takes a lot of extra time.

#include <numeric>
#include <iostream>
#include <vector>
#include <chrono>
#include <random>


//!! pass vector by reference to avoid copies!!!!
double calculate_average(const std::vector<int>& vec_of_num)
{
    return static_cast<double>(std::accumulate(vec_of_num.begin(), vec_of_num.end(), 0)) / static_cast<double>(vec_of_num.size());
}

int main()
{
    std::mt19937 generator(1); // static std::mt19937 generator(std::random_device{}());
    std::uniform_int_distribution<int> distribution{ 0,1000 };

    // This program will create same sequence of
    // random numbers on every program run
    std::vector<int> values(1000000);

    for (auto& value : values)
    {
        value = distribution(generator);
    }

    auto start = std::chrono::high_resolution_clock::now();
    double sum{ 0.0 };

    for (int i = 0; i < 500; i++)
    {
        // force compiler to use average so it can't be optimized away
        sum += calculate_average(values);
    }

    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double, std::milli> float_ms = end - start;

    // force compiler to use sum so it can't be optimized away
    std::cout << "sum = " << sum << "\n"; 
    std::cout << "calculate_average() elapsed time is " << float_ms.count() / 500 << " ms )" << std::endl;
    return 0;
}

Upvotes: -1

CoffeeTableEspresso
CoffeeTableEspresso

Reputation: 2652

C++ is much much slower than it needs to be.

First, for the C++ code, you're copying the vector, which is probably what's taking most of the time. You want to write:

float calculate_average(const std::vector<int>& vec_of_num)

instead of

float calculate_average(std::vector<int> vec_of_num)

In order to avoid making the copy.

Second, make sure you've compiled with optimizations on.

For the numpy version, you're doing an extra conversion, which slows you down.

From the docs:

a: array_like
Array containing numbers whose mean is desired. If a is not an array, a conversion is attempted.

So whatever is passed to numpy.mean is first converted into a numpy.array, then the mean is computed. Making the Numpy array is probably taking a good portion of your time here.

I'd suggest doing two more benchmarks and seeing how they compare with what you already have:

(1) C++ version without the copying, as I describe above, and make sure optimizations are on. (2) Numpy version where you pass in a numpy array instead of a Python list.

Upvotes: 2

Related Questions