Reputation: 3058
I'm trying to create a list of indices that cycles from 0
to m - 1
and is of length n
. I've thus far achieved this as follows:
import numpy as np
m = 7
n = 12
indices = [np.mod(i, m) for i in np.arange(n)]
which results in the following:
[0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4]
Is there a faster way to achieve this?
Thanks for any suggestions.
Upvotes: 3
Views: 129
Reputation: 10759
Just drop the roundtrip to python using the list comprehension. Getting good speed from numpy is all about making sure your loops stay within numpy, and not looping within python itself.
np.mod(np.arange(n), m)
This is the only numpythonically correct answer, in the sense that it makes it obvious that all for loops in python are avoided. (edit: as demonstrated in other answers, it turns out to be far from the fastest solution though)
Upvotes: 2
Reputation: 221524
NumPy based one with np.tile
-
np.tile(np.arange(m),(n+m-1)//m)[:n]
Sample run -
In [58]: m,n = 7,12
In [59]: np.tile(np.arange(m),(n+m-1)//m)[:n]
Out[59]: array([0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4])
If you are looking for efficiency, especially on decent to large sized data, NumPy would do well. In this section we are timing NumPy solutions with variations across m
and n
.
Setup :
import numpy as np
def resize(m,n):
return np.resize(np.arange(m), n)
def mod(m,n):
return np.mod(np.arange(n), m)
def tile(m,n):
return np.tile(np.arange(m),(n+m-1)//m)[:n]
Run timings code on IPython console :
# Setup inputs and timeit those on posted NumPy approaches
m_ar = [10,100,1000]
s_ar = [10,20,50,100,200,500,1000] # scaling array
resize_timings = []
mod_timings = []
tile_timings = []
sizes_str = []
for m in m_ar:
for s in s_ar:
n = m*s+m//2
size_str = str(m) + 'x' + str(n)
sizes_str.append(size_str)
p = %timeit -o -q resize(m,n)
resize_timings.append(p.best)
p = %timeit -o -q mod(m,n)
mod_timings.append(p.best)
p = %timeit -o -q tile(m,n)
tile_timings.append(p.best)
Get results on plot :
# Use pandas to study results
import pandas as pd
df_data = {'Resize':resize_timings,'Mod':mod_timings,'Tile':tile_timings}
df = pd.DataFrame(df_data,index=sizes_str)
FGSZ = (20,6)
T = 'Timings(s)'
FTSZ = 16
df.plot(figsize=FGSZ,title=T,fontsize=FTSZ).get_figure().savefig("timings.png")
Comparing all three
resize
and tile
based ones seem to be doing well.
Comparing resize
and tile
Lets's plot those two only :
tile
seems to be doing better between these two.
Now, let's study the timings in chunks corresponding to three different m's
:
mod
based one wins only on small m
and small n
and the timings on those m's and n's are of the order of 5-6 u-secs, but loses out on most of the other scenarios, and it's the compute involving it that's killing it on those.
Upvotes: 1
Reputation: 675
Since you're asking for the fastest possible it would be good to provide some test times. Thus I tested most of the posted snippets with timeit
module.
Quickly defined functions for easier calls to timeit.
def list_comp(m, n):
return [np.mod(i, m) for i in np.arange(n)]
def leftover(m, n):
nb_cycles = n//m
leftover = n-m*nb_cycles
indices = list(range(m))*nb_cycles + list(range(leftover))
def islice_cycle(m, n):
return list(islice(cycle(range(m)), n))
def npmod(m, n):
return mod(np.arange(m), n)
def resized(m, n):
return np.resize(np.arange(m), n)
Tested with:
timer = timeit.Timer(stmt="function_name(7, 12)", globals=globals()).repeat(repeat=100, number =10000)
print(f'Min: {min(timer):.6}s,\n Avg: {np.average(timer):.6}s')
Results
| Function | Minimum | Average |
|:---------------|------------:|:------------:|
| list_comp | 0.156117s | 0.160433s |
| islice_cycle | 0.00712442s | 0.00726821s |
| npmod | 0.0118933s | 0.0123122s |
| leftover | 0.00943538s | 0.00964464s |
| resized | 0.0818617s | 0.0851646s |
@Austin answer using islice
and cycle
is the fastest of all so far. @T.Lucas is a bit slower, but only by a small margin, but quite fast for pure python.
Other answers are by a magnitude slower.
Upvotes: 1
Reputation: 848
If you are looking for speed, this is the faster way I found:
nb_cycles = n//m
leftover = n-m*nb_cycles
indices = list(range(m))*nb_cycles + list(range(leftover))
Upvotes: 0