Reputation: 3058

Repeating list of indices

I'm trying to create a list of indices that cycles from 0 to m - 1 and is of length n. I've thus far achieved this as follows:

import numpy as np
m = 7
n = 12
indices = [np.mod(i, m) for i in np.arange(n)]

which results in the following:

[0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4]

Is there a faster way to achieve this?

Thanks for any suggestions.

Upvotes: 3

Answers (6)

Eelco Hoogendoorn

Reputation: 10769

Just drop the roundtrip to python using the list comprehension. Getting good speed from numpy is all about making sure your loops stay within numpy, and not looping within python itself.

np.mod(np.arange(n), m)

This is the only numpythonically correct answer, in the sense that it makes it obvious that all for loops in python are avoided. (edit: as demonstrated in other answers, it turns out to be far from the fastest solution though)

Upvotes: 2

Divakar

Reputation: 221684

NumPy based one with np.tile -

np.tile(np.arange(m),(n+m-1)//m)[:n]

Sample run -

In [58]: m,n = 7,12

In [59]: np.tile(np.arange(m),(n+m-1)//m)[:n]
Out[59]: array([0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4])

Benchmarking

If you are looking for efficiency, especially on decent to large sized data, NumPy would do well. In this section we are timing NumPy solutions with variations across m and n.

Setup :

import numpy as np

def resize(m,n):
    return np.resize(np.arange(m), n)

def mod(m,n):
    return np.mod(np.arange(n), m)

def tile(m,n):
    return np.tile(np.arange(m),(n+m-1)//m)[:n]

Run timings code on IPython console :

# Setup inputs and timeit those on posted NumPy approaches
m_ar = [10,100,1000]
s_ar = [10,20,50,100,200,500,1000] # scaling array

resize_timings = []
mod_timings = []
tile_timings = []
sizes_str = []
for m in m_ar:
    for s in s_ar:
        n = m*s+m//2
        size_str = str(m) + 'x' + str(n)
        sizes_str.append(size_str)

        p = %timeit -o -q resize(m,n)
        resize_timings.append(p.best)

        p = %timeit -o -q mod(m,n)
        mod_timings.append(p.best)

        p = %timeit -o -q tile(m,n)
        tile_timings.append(p.best)

Get results on plot :

# Use pandas to study results  
import pandas as pd

df_data = {'Resize':resize_timings,'Mod':mod_timings,'Tile':tile_timings}
df  = pd.DataFrame(df_data,index=sizes_str)

FGSZ = (20,6)
T = 'Timings(s)'
FTSZ = 16
df.plot(figsize=FGSZ,title=T,fontsize=FTSZ).get_figure().savefig("timings.png")

Results

Comparing all three

resize and tile based ones seem to be doing well.

Comparing resize and tile

Lets's plot those two only :

tile seems to be doing better between these two.

Studying in chunks

Now, let's study the timings in chunks corresponding to three different m's :

mod based one wins only on small m and small n and the timings on those m's and n's are of the order of 5-6 u-secs, but loses out on most of the other scenarios, and it's the compute involving it that's killing it on those.

Upvotes: 1

user3053452

Reputation: 675

Since you're asking for the fastest possible it would be good to provide some test times. Thus I tested most of the posted snippets with timeit module.

Quickly defined functions for easier calls to timeit.

def list_comp(m, n):
    return [np.mod(i, m) for i in np.arange(n)]

def leftover(m, n):
    nb_cycles = n//m
    leftover = n-m*nb_cycles
    indices = list(range(m))*nb_cycles + list(range(leftover))

def islice_cycle(m, n):
    return list(islice(cycle(range(m)), n))

def npmod(m, n):
    return mod(np.arange(m), n)

def resized(m, n):
    return np.resize(np.arange(m), n)

Tested with:

timer = timeit.Timer(stmt="function_name(7, 12)", globals=globals()).repeat(repeat=100, number =10000)
print(f'Min: {min(timer):.6}s,\n Avg: {np.average(timer):.6}s')

Results

| Function       |   Minimum   |    Average   |  
|:---------------|------------:|:------------:|  
| list_comp      | 0.156117s   |  0.160433s   |  
| islice_cycle   | 0.00712442s |  0.00726821s |  
| npmod          | 0.0118933s  |  0.0123122s  |  
| leftover       | 0.00943538s |  0.00964464s |  
| resized        | 0.0818617s  |  0.0851646s  |

@Austin answer using islice and cycle is the fastest of all so far. @T.Lucas is a bit slower, but only by a small margin, but quite fast for pure python.

Other answers are by a magnitude slower.

Upvotes: 1

T.Lucas

Reputation: 848

If you are looking for speed, this is the faster way I found:

nb_cycles = n//m
leftover = n-m*nb_cycles
indices = list(range(m))*nb_cycles + list(range(leftover))

Upvotes: 0

yatu

Reputation: 88305

Given that you're using numpy, one way would be using np.arange and np.resize, which will fill the larger resized array with copies of the original array:

np.resize(np.arange(7), 12)
# array([0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4])

Upvotes: 1

Austin

Reputation: 26037

You can use islice + cycle from itertools:

from itertools import islice, cycle

print(list(islice(cycle(range(7)), 12)))
# [0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4]

Upvotes: 3

Repeating list of indices

Answers (6)

Benchmarking

Results

Studying in chunks

Related Questions