Find largest row in a matrix with numpy (row with highest length)

Question

I have a massive array with rows and columns. Some rows are larger than others. I need to get the max length row, that is, the row that has the highest length. I wrote a simple function for this, but I wanted it to be as fas as possible, like numpy fast. Currently, it looks like this:

Example array:

values = [
    [1,2,3],
    [4,5,6,7,8,9],
    [10,11,12,13]
]

def values_max_width(values):
    max_width = 1
    for row in values:
        if len(row) > max_width:
            max_width = len(row)
    return max_width

Is there any way to accomplish this with numpy?

hpaulj · Accepted Answer

In [261]: values = [ 
     ...:     [1,2,3], 
     ...:     [4,5,6,7,8,9], 
     ...:     [10,11,12,13] 
     ...: ] 
     ...:                                                                       
In [262]:                                                                       
In [262]: values                                                                
Out[262]: [[1, 2, 3], [4, 5, 6, 7, 8, 9], [10, 11, 12, 13]]
In [263]: def values_max_width(values): 
     ...:     max_width = 1 
     ...:     for row in values: 
     ...:         if len(row) > max_width: 
     ...:             max_width = len(row) 
     ...:     return max_width 
     ...:                                                                       
In [264]: values_max_width(values)                                              
Out[264]: 6
In [265]: [len(v) for v in values]                                              
Out[265]: [3, 6, 4]
In [266]: max([len(v) for v in values])                                         
Out[266]: 6
In [267]: np.max([len(v) for v in values])                                      
Out[267]: 6

Your loop and the list comprehension are similar in speed, np.max is much slower - it has to first turn the list into an array.

In [268]: timeit max([len(v) for v in values])                                  
656 ns ± 16.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [269]: timeit np.max([len(v) for v in values])                               
13.9 µs ± 181 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [271]: timeit values_max_width(values)                                       
555 ns ± 13 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

If you are starting with a list, it's a good idea to thoroughly test the list implementation. numpy is fast when it is doing compiled array stuff, but creating an array from a list is time consuming.

Making an array directly from values isn't much help. The result in a object dtype array:

In [272]: arr = np.array(values)                                                
In [273]: arr                                                                   
Out[273]: 
array([list([1, 2, 3]), list([4, 5, 6, 7, 8, 9]), list([10, 11, 12, 13])],
      dtype=object)

Math on such an array is hit-or-miss, and always slower than math on pure numeric arrays. We can iterate on such an array, but that iteration is slower than on a list.

In [275]: values_max_width(arr)                                                 
Out[275]: 6
In [276]: timeit values_max_width(arr)                                          
1.3 µs ± 8.27 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Find largest row in a matrix with numpy (row with highest length)

Answers (2)

Related Questions