Ivan Shelonik
Ivan Shelonik

Reputation: 2028

The fastest way to create np.arrays for each item from list of tuples

There is a list of tuples l = [(x,y,z), (x,y,z), (x,y,z)] The idea is to find the fastest way to create different np.arrays for each x-s, y-s, z-s. Need help with finding the fastest solution to do it. To make speed comparison I use code attached below

import time

def myfast():
   code

n = 1000000
t0 = time.time()
for i in range(n): myfast()
t1 = time.time()

total_n = t1-t0

1.  np.array([i[0] for i in l])
    np.array([i[1] for i in l])
    np.array([i[2] for i in l])

output: 0.9980638027191162

2.  array_x = np.zeros((len(l), 1), dtype="float")
    array_y  = np.zeros((len(l), 1), dtype="float")
    array_z  = np.zeros((len(l), 1), dtype="float")

    for i, zxc in enumerate(l):
        array_x[i] = zxc[0]
        array_y[i] = zxc[1]
        array_z[i] = zxc[2]

output 5.5509934425354

3. [np.array(x) for x in zip(*l)]

output 2.5070037841796875

5. array_x, array_y, array_z = np.array(list(zip(*l)))

output 2.725318431854248

Upvotes: 1

Views: 570

Answers (4)

norok2
norok2

Reputation: 26896

I believe most (but not all) of the ingredients of this answer are actually present in the other answers, but on all the answers so far I have not seen a apple-to-apple comparison, in the sense that some approaches were not returning a list of np.ndarray objects, but rather a (convenient in my opinion) single np.ndarray().

It is not clear whether this is acceptable to you, so I am adding proper code for this. Besides that the performances may be different because is some cases you are adding an extra step, while for some others you may not need to create large objects (that could reside in different memory pages).

In the end, for smaller inputs (3 x 10), the list of np.ndarray()s is just some additional burden that adds up significantly to the timing. For larger inputs (3 x 1000) and above the extra computation is not significant any longer, but an approach involving comprehensions and avoiding the creation of a large numpy array can becomes as fast as (or even faster than) the fastest methods for smaller inputs.

Also, all the code I present work for arbitrary sizes of the tuples/list (as long as the inner tuples all have the same size, of course).

(EDIT: added a comment on the final results)


The tested methods are:

import numpy as np


def to_arrays_zip(items):
    return np.array(list(zip(*items)))


def to_arrays_transpose(items):
    return np.array(items).transpose()


def to_arrays_zip_split(items):
    return [arr for arr in np.array(list(zip(*items)))]


def to_arrays_transpose_split(items):
    return [arr for arr in np.array(items).transpose()]


def to_arrays_comprehension(items):
    return [np.array([items[i][j] for i in range(len(items))]) for j in range(len(items[0]))]


def to_arrays_comprehension2(items):
    return [np.array([item[j] for item in items]) for j in range(len(items[0]))]

(This is a convenient function to check that the results are the same.)

def test_equal(items1, items2):
    return all(np.all(x == y) for x, y in zip(items1, items2))

For small inputs:

N = 3
M = 10
ll = [tuple(range(N)) for _ in range(M)]

print(to_arrays_comprehension2(ll))

print('Returning `np.ndarray()`')
%timeit to_arrays_zip(ll)
# 2.82 µs ± 28 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit to_arrays_transpose(ll)
# 3.18 µs ± 30 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

print('Returning a list')
%timeit to_arrays_zip_split(ll)
# 3.71 µs ± 47 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit to_arrays_transpose_split(ll)
# 3.97 µs ± 42.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit to_arrays_comprehension(ll)
# 5.91 µs ± 96.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit to_arrays_comprehension2(ll)
# 5.14 µs ± 109 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Where the podium is:

  1. to_arrays_zip_split() (the non-_split if you are OK with a single array)
  2. to_arrays_zip_transpose_split() (the non-_split if you are OK with a single array)
  3. to_arrays_comprehension2()

For somewhat larger inputs:

N = 3
M = 1000
ll = [tuple(range(N)) for _ in range(M)]

print('Returning `np.ndarray()`')
%timeit to_arrays_zip(ll)
# 146 µs ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit to_arrays_transpose(ll)
# 222 µs ± 2.01 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

print('Returning a list')
%timeit to_arrays_zip_split(ll)
# 147 µs ± 1.68 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit to_arrays_transpose_split(ll)
# 221 µs ± 2.58 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit to_arrays_comprehension(ll)
# 261 µs ± 2.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit to_arrays_comprehension2(ll)
# 212 µs ± 1.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The podium becomes:

  1. to_arrays_zip_split() (whether you use the _split or non-_split variants, does not make much difference)
  2. to_arrays_comprehension2()
  3. to_arrays_zip_transpose_split() (whether you use the _split or non-_split variants, does not make much difference)

For even larger inputs:

N = 3
M = 1000000
ll = [tuple(range(N)) for _ in range(M)]

print('Returning `np.ndarray()`')
%timeit to_arrays_zip(ll)
# 215 ms ± 4.27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit to_arrays_transpose(ll)
# 220 ms ± 4.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

print('Returning a list')
%timeit to_arrays_zip_split(ll)
# 218 ms ± 6.21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit to_arrays_transpose_split(ll)
# 222 ms ± 3.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit to_arrays_comprehension(ll)
# 248 ms ± 3.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit to_arrays_comprehension2(ll)
# 186 ms ± 481 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

The podium becomes:

  1. to_arrays_comprehension2()
  2. to_arrays_zip_split() (whether you use the _split or non-_split variants, does not make much difference)
  3. to_arrays_zip_transpose_split() (whether you use the _split or non-_split variants, does not make much difference)

and the the _zip and _transpose variants are pretty close to each other.

(I also tried to speed things up with Numba that didn't go well)

Upvotes: 0

Ralf
Ralf

Reputation: 16505

There are some really good option in here, so I summarized them and compared speed:

import numpy as np

def f1(input_data):
    array_x = np.array([elem[0] for elem in input_data])
    array_y = np.array([elem[1] for elem in input_data])
    array_z = np.array([elem[2] for elem in input_data])

    return array_x, array_y, array_z

def f2(input_data):
    array_x = np.zeros((len(input_data), ), dtype="float")
    array_y = np.zeros((len(input_data), ), dtype="float")
    array_z = np.zeros((len(input_data), ), dtype="float")

    for i, elem in enumerate(input_data):
        array_x[i] = elem[0]
        array_y[i] = elem[1]
        array_z[i] = elem[2]

    return array_x, array_y, array_z

def f3(input_data):
    return [np.array(elem) for elem in zip(*input_data)]

def f4(input_data):
    return np.array(list(zip(*input_data)))

def f5(input_data):
    return np.array(input_data).transpose()

def f6(input_data):
    array_all = np.array(input_data)
    array_x = array_all[:, 0]
    array_y = array_all[:, 1]
    array_z = array_all[:, 2]

    return array_x, array_y, array_z

First I asserted that all of them return the same data (using np.array_equal()):

data = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for array_list in zip(f1(data), f2(data), f3(data), f4(data), f5(data), f6(data)):
    # print()
    # for i, arr in enumerate(array_list):
    #     print('array from function', i+1)
    #     print(arr)
    for i, arr in enumerate(array_list[:-1]):
        assert np.array_equal(arr, array_list[i+1])

And the time comparisson:

import timeit
for f in [f1, f2, f3, f4, f5, f6]:
    t = timeit.timeit('f(data)', 'from __main__ import data, f', number=100000)
    print('{:5s} {:10.4f} seconds'.format(f.__name__, t))

gives these results:

data = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]    # 3 tuples
timeit number=100000
f1        0.3184 seconds
f2        0.4013 seconds
f3        0.2826 seconds
f4        0.2091 seconds
f5        0.1732 seconds
f6        0.2159 seconds

data = [(1, 2, 3) for _ in range(10**6)]    # 1 millon tuples
timeit number=10
f1        2.2168 seconds
f2        2.8657 seconds
f3        2.0150 seconds
f4        1.9790 seconds
f5        2.6380 seconds
f6        2.6586 seconds

making f5() the fastest option for short input and f4() the fastest option for big input.


If the number of elements in each tuple will be more than 3, then only 3 functions apply to that case (the others are hardcoded for 3 elements in each tuple):

data = [tuple(range(10**4)) for _ in range(10**3)]
timeit number=10
f3       11.8396 seconds
f4       13.4672 seconds
f5        4.6251 seconds

making f5() again the fastest option for these criteria.

Upvotes: 2

Håkon T.
Håkon T.

Reputation: 1202

Maybe I am missing something, but why not just pass list of tuples directly to np.array? Say if:

n = 100
l = [(0, 1, 2) for _ in range(n)]

arr = np.array(l)
x = arr[:, 0]
y = arr[:, 1]
z = arr[:, 2]

Btw, I prefer to use the following to time code:

from timeit import default_timer as timer

t0 = timer()
do_heavy_calculation()
print("Time taken [sec]:", timer() - t0)

Upvotes: 1

kederrac
kederrac

Reputation: 17322

you could try:

import numpy
array_x, array_y, array_z = numpy.array(list(zip(*l)))

or just:

numpy.array(list(zip(*l)))

and more elegant way:

numpy.array(l).transpose()

Upvotes: 2

Related Questions