Vectorize a function for lists of lists

Question

I need to use an efficient python data structure with these characteristics:

structure like lists stacked many times
with no regularity in list length
with a fixed deepness
with always the same data type.

Here is an example with a deepness of 3

[[[1,2] , [3,4,5]],
 [[6,7,8] , [9] , [10] , [11,12]],
 [[13] , [14,15] , [16,17,18]]

Most of the time the structure would contain numpy arrays or numbers, but it could also be another object like a dict. However, always the same datatype in a given structure.

My main problem is that I need to apply functions to such structures (like a "vectorized" function). I want my functions to take several structures with the same shape as arguments, and return other ones.

What would be the most efficient in your opinion:

Using stacked lists or stacked numpy arrays: then how to code an efficient "vectorize" function without using many for loops?
Using a single array and store the shape separately (I'm currently working on this)
Creating myself a structure, but how could I make it more efficient than stacked lists?
Or maybe you know a library which could help me with that?

I am especially looking for RAM efficiency.

I hope I've made my problem clear, thanks for your help.

Alain T. · Accepted Answer

If you want to get truly vectorized processing, you will need to use a library such as numpy. But these will likely limit the datatypes that you can support in order to allow processing by GPUs.

In either case, you could use a dictionary to flatten the structure and facilitate batch processing of elements of the structure. This would be a dictionary with tuples as keys where each entry in the tuple represents the index of value at that level:

For example:

[ 
  [ [1,2] ,   [3,4,5] ],
  [ [6,7,8] , [9] ,     [10] ,     [11,12] ],
  [ [13] ,    [14,15] , [16,17,18] ]
]

could be represented in such a dictionary as:

{ 
  (0,0,0) : 1,
  (0,0,1) : 2,
  (0,1,0) : 3,
  (0,1,1) : 4,
  (0,1,2) : 5,
  (1,0,0) : 6,
  (1,0,1) : 7,
  (1,0,2) : 8,
  (1,1,0) : 9,
  (1,2,0) : 10,
  (1,3,0) : 11,
  (1,3,1) : 12,
  (2,0,0) : 13,
  (2,1,0) : 14,
  (2,1,1) : 15,
  (2,1,0) : 16,
  (2,1,1) : 17,
  (2,1,2) : 18
}

This could also be represented in numpy using two arrays (one for level indices and one for data)

Processing between structures of this type would provide fast traversal of leaf values in the tree structure while maintaining the relationships between branches.

for example:

# sum of values under second branch:
result = sum( value for level,value in data.items() if level[0] == 1 )

# or using numpy:
result = np.sum(data[levels[:,0]==1]) 

# adding two structures:
result = { k:data1.get(k,0)+data2.get(k,0) for k in set((*data1,*data2)) }

# or using numpy (assuming same levels in both structures)
resultLevels, resultData = levels1,data1+data2

# numpy adding structures with different levels is a bit more involved
# but still feasible.

Vectorize a function for lists of lists

Answers (2)

Related Questions