Reputation: 152637
I want to classify some data I have and for that to work I want to chain indexing of python lists. Simplified I have a nested list:
lst = [[[1], [2]], [[3, 3], [4]], [[5], [6,6,6]]]
and I want to iterate over the product of the first two indices but keep the third one the same:
from itertools import product
for index1, index2 in product(range(3), range(2)):
print(lst[index1][index2][0])
However I want to make this more general without knowing in advance how many substructures deep this needs to go (I want to make the number of range
s passed to itertools.product
variable).
I'm struggling a bit how I can generalize the [index1][index2][0]
to accept an arbitary number of indices
, the best I could come up with was functools.reduce
:
from functools import reduce
for indices in product(range(3), range(2)):
print(reduce(list.__getitem__, indices, lst)[0])
That seems awfully complicated (and is significantly slower than the manual indexing), so I was wondering if there was a better and faster way to do that. I'm using both python 2.x and 3.x and external libraries are definetly okay (however it shouldn't require NumPy
or packages based on NumPy
).
Upvotes: 4
Views: 683
Reputation: 18628
I propose a recursive way.
def theshape(lst):
l=lst
shape=[]
while isinstance(l,list):
shape.append(len(l))
l=l[0]
return shape
This function is intended to find the shape of your structure, which is supposed regular until last dimension.
def browse(lst):
shape=theshape(lst)
ndim=len(shape)
def level(l,k):
if k==ndim:
print(l)
else:
for i in range(shape[k]):
level(l[i],k+1)
level(lst,0)
This one browse recursively all the levels. it minimize pointer changes.
A simple example :
u=arange(2**6).reshape(4,2,1,2,2,1,1,2).tolist()
browse(u)
0
2
.
.
.
62
Some tests on big structures (with print discarded by print = lambda _ : None
) :
def go(lst):
for x in product(*[range(k) for k in theshape(lst)]):
print(reduce(lambda result, index: result[index], x, lst))
In [1]: u=arange(2**21).reshape([2]*21).tolist()
In [2]: %time go(u)
Wall time: 14.8 s
In [3]: %time browse(u)
Wall time: 3.5 s
In [5]: u=arange(2**21).reshape([1]*30+[2**21]+[1]).tolist()
In [6]: %time go(u)
Wall time: 18 s
In [7]: %time browse(u)
Wall time: 3.48 s
In [8]: u=arange(2**21).reshape([1]+[2**21]+[1]*30).tolist()
In [9]: %time go(u)
Wall time: 14 s
In [10]: %time browse(u)
Wall time: 58.1 s
This shows that performance is very data-structure dependent.
EDIT:
Finally, simplest is fastest. theshape
is not necessary.
def browse2(lst):
if isinstance(lst,list):
for l in lst:
browse2(l)
else: print(lst)
it is often 30% faster than browse. And it works whatever the structure of the list.
Upvotes: 2
Reputation: 15926
I would just use the python-builtin reduce
for this, and it doesn’t seem that complex and it wasn’t that much slower in my testing:
from itertools import product
for x in product(range(3), range(2)):
rg = reduce(lambda result, index: result[index], x, lst)
value = rg[0]
If you're worried about the timing penalty from reduce
, you can just use a for
loop instead:
for x in product(range(3), range(2)):
value = lst
for index in x:
value = value[index]
value = value[0]
This will, in all cases, be slower than manual indexing because a for
loop will require extra operations for figuring out the stop condition. The question, as always, is whether the speed optimization is worth it to you for the flexibility of arbitrary depth specification.
As for why you would use reduce
vs. for
, there is an ongoing stylistic debate within the JavaScript community as to whether you should use the reduce
, map
, filter
functions on Array
s or do the for
loop version instead because it is faster, and you may want to refer to that debate to pick which side you fall on.
Timing with for loop:
In [22]: stmt = '''
...: from itertools import product
...: def go():
...: lst = [[[1], [2]], [[3, 3], [4]], [[5], [6,6,6]]]
...: for x in product(range(3), range(2)):
...: # rg = reduce(lambda result, index: result[index], x, lst)
...: value = lst
...: for index in x:
...: value = value[index]
...: value = value[0]
...: # value = lst[x[0]][x[1]][0]
...: '''
In [23]: timeit(setup=stmt, stmt='go()', number=1000000)
Out[23]: 4.003296852111816
Timing with reduce
:
In [18]: stmt = '''
...: from itertools import product
...: def go():
...: lst = [[[1], [2]], [[3, 3], [4]], [[5], [6,6,6]]]
...: for x in product(range(3), range(2)):
...: rg = reduce(lambda result, index: result[index], x, lst)
...: value = rg[0]
...: # value = lst[x[0]][x[1]][0]
...: '''
In [19]: timeit(setup=stmt, stmt='go()', number=1000000)
Out[19]: 6.164631128311157
Timing with manual indexing:
In [16]: stmt = '''
...: from itertools import product
...: def go():
...: lst = [[[1], [2]], [[3, 3], [4]], [[5], [6,6,6]]]
...: for x in product(range(3), range(2)):
...: # rg = reduce(lambda result, index: result[index], x, lst)
...: value = lst[x[0]][x[1]][0]
...: '''
In [17]: timeit(setup=stmt, stmt='go()', number=1000000)
Out[17]: 3.633723020553589
Upvotes: 1
Reputation: 3706
how about dynamically creating the hard indexing?
lst = [[[1], [2]], [[3, 3], [4]], [[5], [6,6,6]]]
from itertools import product
for index1, index2 in product(range(3), range(2)):
print(lst[index1][index2][0])
# need depth info from somewhere to create hard coded indexing
prod_gen = product(range(3), range(2))
first = next(prod_gen)
indx_depth = len(first) + 1
exec( ('def IndexThisList(lst, indxl):\n' +
' return lst' + ''.join(('[indxl[' + str(i) + ']]'
for i in range(indx_depth)))))
# just to see what it exec'd:
print(("def IndexThisList(lst, indx_itrbl):\n" +
" return lst" + ''.join(('[indx_itrbl[' + str(i) + ']]'
for i in range(indx_depth)))))
# the exec is only invoked again when changing the indexing depth
# for accessing the list with its currently instantiated depth of indexing
# just use the current instance of the generated function
print(IndexThisList(lst, first + (0,)))
for itpl in prod_gen:
print (IndexThisList(lst, itpl + (0,)))
1
2
3
4
5
6
def IndexThisList(lst, indx_itrbl):
return lst[indx_itrbl[0]][indx_itrbl[1]][indx_itrbl[2]]
1
2
3
4
5
6
just a beginner at this, seems like my exec should be wrapped with another function to pass the index_depth but its eluding me for now
Upvotes: 1