Reputation: 498
Was fooling around about what is the best way to calculate a mean of a list in python. Although I thought that numpy as optimized My results show that you shouldn't use numpy when it comes to this. I was wondering why and how python achieve this performance.
So basically I am trying to figure out how come native python is faster than numpy.
My code for testing:
import random
import numpy as np
import timeit
def average_native(l):
return sum(l)/len(l)
def average_np(l):
return np.mean(l)
def test_time(func, arg):
starttime = timeit.default_timer()
for _ in range(500):
func(arg)
return (timeit.default_timer() - starttime) / 500
for i in range(1, 7):
numbers = []
for _ in range(10**i):
numbers.append(random.randint(0, 100))
print("for " + str(10**i) + " numbers:")
print(test_time(average_native, numbers))
print(test_time(average_np, numbers))
The results:
for 10 numbers:
2.489999999999992e-07
8.465800000000023e-06
for 100 numbers:
8.554000000000061e-07
1.3220000000000009e-05
for 1000 numbers:
7.2817999999999495e-06
6.22666e-05
for 10000 numbers:
6.750499999999993e-05
0.0005553966000000001
for 100000 numbers:
0.0006954238
0.005352444999999999
for 1000000 numbers:
0.007034196399999999
0.0568878216
BTW I was running same code in c++ and was surprised to see that the python code is faster. test code:
#include <iostream>
#include <cstdlib>
#include <vector>
#include <chrono>
float calculate_average(std::vector<int> vec_of_num)
{
double sum=0;
uint64_t cnt=0;
for(auto & elem : vec_of_num)
{
cnt++;
sum = sum + elem;
}
return sum / cnt;
}
int main()
{
// This program will create same sequence of
// random numbers on every program run
std::vector<int> vec;
for(int i = 0; i < 1000000; i++)
vec.push_back(rand());
auto start = std::chrono::high_resolution_clock::now();
for(int i = 0; i < 500; i++)
calculate_average(vec);
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> float_ms = end - start;
std::cout << "calculate_average() elapsed time is " << float_ms.count()/500 << " ms )" << std::endl;
return 0;
}
results:
calculate_average() elapsed time is 11.2082 ms )
Am I missing something?
Edit: I was running the c++ code on an online compiler (probebly without any optimization). Also it isn't the same Hardware, and how know what is going on it that server. After running and compiling the code in my device the code is much faster.
Edit2: So I changed the code for a numpy array in the numpy function and we do see that for smaller array/list the native python is better, however after around 1000 values numpy is preforming better. I don't really understand why. which optimizations numpy have that produce these results?
new results:
for 10 numbers:
2.4540000000000674e-07
6.722200000000012e-06
for 100 numbers:
8.497999999999562e-07
6.583400000000017e-06
for 1000 numbers:
6.990799999999964e-06
7.916000000000034e-06
for 10000 numbers:
6.61604e-05
1.5475799999999985e-05
for 100000 numbers:
0.0006671193999999999
8.412259999999994e-05
for 1000000 numbers:
0.0068192092
0.0008199298000000005
Maybe I need to restart this question :)
Upvotes: 1
Views: 998
Reputation: 1
My code is at the end.
Conclusion/answer to the question
From the tests results below, the Numpy methods (at least for computing the mean) are definitely faster than the Python built-in methods, as long as you are working with big enough lists or arrays.
However, the Python built-in method I tested can outperform the Numpy one for very short (1D-)lists or (1D-)arrays. Here it happens for lists/arrays containing less than ~150 numbers : so you must be aware of it!
@norok2 's above answer explains this behavior.
To no surprise, you can also see that the relative performances between each method remain consistent, no matter how many times the functions are called.
Method
As I went through a tutorial about Numpy, I read that with increasing size, Numpy arrays manipulations can be up to 30 times faster than the same operation with built-in Python list operations.
So, out of curiosity, I made a script which measure the execution time of 2 fonctions (through a decorator) :
def mean_array(array):
return np.mean(array)
def mean_list(liste):
return sum(liste)/len(liste)
Randomness, reliability
With the random
module, I generated maxnsize
lists income
. Each list income[i]
contains nsize[i]
randomly generated numbers between 1400 and 10000 (I figured a case with the incomes from nsize[i]
financial operations in a year).
Note that maxnsize
is the number of times each function (time) test were repeated. This gave me a sample of maxnsize
measurements of the execution time of the above functions, so I could infer a statistical mean and standard deviation to enforce the reliability of my tests.
To further improve the robustness of my conclusions and enforce a stress test of comparing the functions at different list sizes (hence scattering the time execution scales), I also randomized each nsize[i]
to be between 1
and a parameter nmaxoperations
.
Thus, before we proceed to measure the functions performances : all the incomes, the lists containing them, and the size of each list are randomly generated using numpy.randint
.
I tested those parameters :
maxnsize =
1e3
,1e4
,1e5
maxnsize
, I tested nmaxoperations =
10
,1e2
,..,1e6
,1e7
I note x
the time ratio of the Python method over the Numpy one.
Results
maxnsize = 1000
nmaxoperations=
10
: x=0.2723±0.0654
1e2
: x=0.7056±0.3035
1e3
: x=4.4767±2.6933
1e4
: x=23.670±11.838
1e5
: x=42.059±14.877
maxnsize = 10000
nmaxoperations=
10
: x=0.2290±0.0854
1e2
: x=0.7654±0.3593
1e3
: x=4.6912±2.9002
1e4
: x=23.052±11.884
1e5
: x=44.127±14.886
maxnsize = 100000
nmaxoperations=
10
: x=0.2854±0.1108
1e2
: x=0.7566±0.3488
1e3
: x=4.6768±3.0187
1e4
: x=23.835±11.870
1e5
: x=43.780±14.326
Code
import numpy as np
import time
def measure_time(fonction):
def modified_function(element):
time_start = time.time()
returned = fonction(element)
time_end = time.time()
time_execution = time_end - time_start
return time_execution, returned
return modified_function
@measure_time
def mean_array(array):
return np.mean(array)
@measure_time
def mean_liste(liste):
return sum(liste)/len(liste)
maxnsize = 100
nmaxoperations = 1000
nsize = np.random.randint(1,high=nmaxoperations,size=maxnsize)
time_liste = []
time_array = []
ratio_performances = []
means_liste = []
means_array = []
for i in range(0,maxnsize):
income = np.random.randint(1400,high=10000,size=nsize[i])
perf_array = tuple(mean_array(income))
perf_liste = tuple(mean_liste(income))
time_liste.append(perf_liste[0])
means_liste.append(perf_liste[1])
time_array.append(perf_array[0])
means_array.append(perf_array[1])
ratio_performances.append(time_liste[i]/time_array[i])
time_mean_liste = np.mean(time_liste)
time_mean_array = np.mean(time_array)
time_std_liste = np.std(time_liste)
time_std_array = np.std(time_array)
mean_income_liste = np.mean(means_liste)
mean_income_array = np.mean(means_array)
mean_std_liste = np.std(means_liste)
mean_std_array = np.std(means_array)
ratio_performances_mean = np.mean(ratio_performances)
ratio_performances_std = np.std(ratio_performances)
print(u"With Python\'s list : mean = {}\u00B1{} computed within {}\u00B1{} s".format(mean_income_liste,
mean_std_liste,
time_mean_liste,
time_std_liste))
print(u"With Numpy\'s array : mean = {}\u00B1{} computed within {}\u00B1{} s".format(mean_income_array,
mean_std_array,
time_mean_array,
time_std_array))
print(u"Numy took {}\u00B1{} less time to compute mean thant with Python built-in methods".format(ratio_performances_mean,ratio_performances_std))
Upvotes: 0
Reputation: 26906
The function numpy.mean()
is doing a lot more than what sum()
and len()
is doing, that is why it is so "slow".
The kind of functionalities included in np.mean()
is essentially what it makes it a ufunc, and especially the support for n-dimensional arrays.
However, the largest contributor to the speed difference between the naïve implementation and np.mean()
is actually converting the list
to a NumPy array.
Consider the following ways to compute the average:
def mean_naive(seq):
return sum(seq) / len(seq)
import statistics
def mean_st(seq):
return statistics.mean(seq)
mean()
function:import numpy as np
def mean_np(seq):
return np.mean(seq)
import numpy as np
def mean_naive_conv(seq):
np.array(seq) # the result of the conversion is not used!
return sum(seq) / len(seq)
llvm
. If sum() / len()
is faster-than-C, then mean_naive_conv()
should outperform this one.import numpy as np
import numba as nb
@nb.njit
def mean_naive_nb(seq):
sum_ = 0
for x in seq:
sum_ += x
return sum_ / len(seq)
def mean_naive_np_nb(seq):
seq = np.array(seq)
return mean_naive_nb(seq)
However, when we benchmarks these with the following code:
import random
funcs = (
mean_naive, mean_st, mean_np, mean_naive_conv, mean_naive_np_nb, only_conv)
timings = {}
for k in range(1, 20):
n = 2 ** k
seq = tuple(random.random() for _ in range(n))
print(f"n = {n}, k = {k}")
timings[n] = []
base = funcs[0](seq)
for func in funcs:
res = func(seq) # this ensures that JIT-ted code is compiled before benchmarking
is_good = np.allclose(base, res)
timed = %timeit -r 4 -n 8 -q -o func(seq)
timing = timed.best * 1e6
timings[n].append(timing)
print(f"{func.__name__:>24} {is_good!s:>5} {timing:10.3f} µs")
to be plotted with:
import pandas as pd
df = pd.DataFrame(data=timings, index=[func.__name__ for func in funcs]).transpose()
df.plot(marker='o', xlabel='Input size / #', ylabel='Best timing / µs', ylim=[0, 40000])
fig = plt.gcf()
fig.patch.set_facecolor('white')
and with:
df.plot(marker='o', xlabel='Input size / #', ylabel='Best timing / µs', ylim=[0, 600], xlim=[0, 9000])
fig = plt.gcf()
fig.patch.set_facecolor('white')
(for some zooming on the smaller input sizes)
we can observe:
statistics
-based approach is the slowest by far and largelist
to NumPy array:
np.mean()
is the fastest for larger input sizes, likely because it is compiled with specific optimizations (I'd speculate making optimal use of SIMD instructions); for smaller inputs, the running time is dominated by the overhead for supporting all ufunc
functionalitiessum() / len()
eventually gets to be the fastestThis indicates that sum() / len()
is essentially slower than optimized C++ code acting on arrays.
Upvotes: 2
Reputation: 13076
You are copying the array for each call to average which takes a lot of extra time.
#include <numeric>
#include <iostream>
#include <vector>
#include <chrono>
#include <random>
//!! pass vector by reference to avoid copies!!!!
double calculate_average(const std::vector<int>& vec_of_num)
{
return static_cast<double>(std::accumulate(vec_of_num.begin(), vec_of_num.end(), 0)) / static_cast<double>(vec_of_num.size());
}
int main()
{
std::mt19937 generator(1); // static std::mt19937 generator(std::random_device{}());
std::uniform_int_distribution<int> distribution{ 0,1000 };
// This program will create same sequence of
// random numbers on every program run
std::vector<int> values(1000000);
for (auto& value : values)
{
value = distribution(generator);
}
auto start = std::chrono::high_resolution_clock::now();
double sum{ 0.0 };
for (int i = 0; i < 500; i++)
{
// force compiler to use average so it can't be optimized away
sum += calculate_average(values);
}
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> float_ms = end - start;
// force compiler to use sum so it can't be optimized away
std::cout << "sum = " << sum << "\n";
std::cout << "calculate_average() elapsed time is " << float_ms.count() / 500 << " ms )" << std::endl;
return 0;
}
Upvotes: -1
Reputation: 2652
C++ is much much slower than it needs to be.
First, for the C++ code, you're copying the vector
, which is probably what's taking most of the time. You want to write:
float calculate_average(const std::vector<int>& vec_of_num)
instead of
float calculate_average(std::vector<int> vec_of_num)
In order to avoid making the copy.
Second, make sure you've compiled with optimizations on.
For the numpy
version, you're doing an extra conversion, which slows you down.
a: array_like
Array containing numbers whose mean is desired. If a is not an array, a conversion is attempted.
So whatever is passed to numpy.mean
is first converted into a numpy.array
, then the mean is computed. Making the Numpy array is probably taking a good portion of your time here.
I'd suggest doing two more benchmarks and seeing how they compare with what you already have:
(1) C++ version without the copying, as I describe above, and make sure optimizations are on. (2) Numpy version where you pass in a numpy array instead of a Python list.
Upvotes: 2