Kurt
Kurt

Reputation: 297

Julia: Searching for a column in a sorted matrix

I have a matrix that is sorted like the one shown below

1 1 2 2 3

1 2 3 4 1

2 1 2 1 1

It's a bit hard for me to describe the ordering, but hopefully it's clear from the example. The rough idea is that we first sort on the first row, then the second, etc.

I would like to find a specific column in the matrix, and that column may or may not exist in it.

I tried the following code:

index = searchsortedfirst(1:total_cols, col, lt=(index,x) -> (matrix[: index] < x))

The above code works, but it is slow. I profiled the code, and it spends a lot of time in "_get_index". I then tried the following

  @views index = searchsortedfirst(1:total_cols, col, lt=(index,x) -> (matrix[: index] < x))

As expected this helped a lot, likely due to the slices I'm taking. However, is there a better way to go about this? There still seems to be a lot of overhead, and I feel like there might be a cleaner way to write this, which would be easier to optimize.

However, I absolutely value speed over clarity.

Here is some code I wrote to compare binary vs. linear search.

using Profile

function test_search()
    max_val = 20
    rows = 4
    matrix = rand(1:max_val, rows, 10^5)
    matrix = Array{Int64,2}(sortslices(matrix, dims=2))

    indices = @time @profile lin_search(matrix, rows, max_val, 10^3)
    indices = @time @profile bin_search(matrix, rows, max_val, 10^3)
end
function bin_search(matrix, rows, max_val, repeats)
    indices = zeros(repeats)
    x = zeros(Int64, rows)
    cols = size(matrix)[2]
    for i = 1:repeats
        x = rand(1:max_val, rows)
        @inbounds @views index = searchsortedfirst(1:cols, x, lt=(index,x)->(matrix[:,index] < x))
        indices[i] = index
    end
    return indices
end

function array_eq(matrix, index, y, rows)
    for i=1:rows
        @inbounds if view(matrix, i, index) != y[i]
            return false
        end
    end
    return true
end

function lin_search(matrix, rows, max_val, repeats)
    indices = zeros(repeats)
    x = zeros(Int64, rows)
    cols = size(matrix)[2]

    for i = 1:repeats
        index = cols + 1
        x = rand(1:max_val, rows)
        for j=1:cols
            if array_eq(matrix, j, x, rows)
                index = j;
                break
            end
        end
        indices[i] = index
    end
    return indices
end

Profile.clear()
test_search()

Here is some sample output

0.041356 seconds (68.90 k allocations: 3.431 MiB)
0.070224 seconds (110.45 k allocations: 5.418 MiB)

After adding some more @inbounds, it looks like a linear search is faster than binary. Seems strange when there are 10^5 columns.

Upvotes: 1

Views: 181

Answers (2)

Kurt
Kurt

Reputation: 297

I discovered two issues, that when fixed result in both linear and binary searches being much faster. And the binary search becomes faster than linear.

First, there was some type instability. I changed on one of the lines to

matrix::Array{Int64,2} = Array{Int64,2}(sortslices(matrix, dims=2))

This resulted in an order of magnitude speedup. Also it turns out that using @views does not do anything in the following code

@inbounds @views index = searchsortedfirst(1:cols, x, lt=(index,x)->(matrix[:,index] < x))

I am new to Julia, but my hunch is that since matrix[:,index] is copied no matter what in the anonymous function. This would make sense, since it allows for closures.

If I write a separate non-anonymous function, then that copy goes away. Linear search didn't copy the slices, so this also really sped up the binary search.

Upvotes: 0

carstenbauer
carstenbauer

Reputation: 10117

If speed is most important, why not simply use the fact that Julia allows you to write fast loops?

julia> function findcol(M, col)                
           @inbounds @views for c in axes(M, 2)
               M[:,c] == col && return c       
           end                                 
           return nothing                      
       end                                     
findcol (generic function with 1 method)       

julia> col = [2,3,2];                          

julia> M = [1 1 2 2 3;                         
           1 2 3 4 1;                          
           2 1 2 1 1];                         

julia> @btime findcol($M, $col)                
  32.854 ns (3 allocations: 144 bytes)         
3  

This should probably be fast enough and does not even take into account any ordering.

Upvotes: 2

Related Questions