Reputation: 297
I have a matrix that is sorted like the one shown below
1 1 2 2 3
1 2 3 4 1
2 1 2 1 1
It's a bit hard for me to describe the ordering, but hopefully it's clear from the example. The rough idea is that we first sort on the first row, then the second, etc.
I would like to find a specific column in the matrix, and that column may or may not exist in it.
I tried the following code:
index = searchsortedfirst(1:total_cols, col, lt=(index,x) -> (matrix[: index] < x))
The above code works, but it is slow. I profiled the code, and it spends a lot of time in "_get_index". I then tried the following
@views index = searchsortedfirst(1:total_cols, col, lt=(index,x) -> (matrix[: index] < x))
As expected this helped a lot, likely due to the slices I'm taking. However, is there a better way to go about this? There still seems to be a lot of overhead, and I feel like there might be a cleaner way to write this, which would be easier to optimize.
However, I absolutely value speed over clarity.
Here is some code I wrote to compare binary vs. linear search.
using Profile
function test_search()
max_val = 20
rows = 4
matrix = rand(1:max_val, rows, 10^5)
matrix = Array{Int64,2}(sortslices(matrix, dims=2))
indices = @time @profile lin_search(matrix, rows, max_val, 10^3)
indices = @time @profile bin_search(matrix, rows, max_val, 10^3)
end
function bin_search(matrix, rows, max_val, repeats)
indices = zeros(repeats)
x = zeros(Int64, rows)
cols = size(matrix)[2]
for i = 1:repeats
x = rand(1:max_val, rows)
@inbounds @views index = searchsortedfirst(1:cols, x, lt=(index,x)->(matrix[:,index] < x))
indices[i] = index
end
return indices
end
function array_eq(matrix, index, y, rows)
for i=1:rows
@inbounds if view(matrix, i, index) != y[i]
return false
end
end
return true
end
function lin_search(matrix, rows, max_val, repeats)
indices = zeros(repeats)
x = zeros(Int64, rows)
cols = size(matrix)[2]
for i = 1:repeats
index = cols + 1
x = rand(1:max_val, rows)
for j=1:cols
if array_eq(matrix, j, x, rows)
index = j;
break
end
end
indices[i] = index
end
return indices
end
Profile.clear()
test_search()
Here is some sample output
0.041356 seconds (68.90 k allocations: 3.431 MiB)
0.070224 seconds (110.45 k allocations: 5.418 MiB)
After adding some more @inbounds, it looks like a linear search is faster than binary. Seems strange when there are 10^5 columns.
Upvotes: 1
Views: 181
Reputation: 297
I discovered two issues, that when fixed result in both linear and binary searches being much faster. And the binary search becomes faster than linear.
First, there was some type instability. I changed on one of the lines to
matrix::Array{Int64,2} = Array{Int64,2}(sortslices(matrix, dims=2))
This resulted in an order of magnitude speedup. Also it turns out that using @views does not do anything in the following code
@inbounds @views index = searchsortedfirst(1:cols, x, lt=(index,x)->(matrix[:,index] < x))
I am new to Julia, but my hunch is that since matrix[:,index] is copied no matter what in the anonymous function. This would make sense, since it allows for closures.
If I write a separate non-anonymous function, then that copy goes away. Linear search didn't copy the slices, so this also really sped up the binary search.
Upvotes: 0
Reputation: 10117
If speed is most important, why not simply use the fact that Julia allows you to write fast loops?
julia> function findcol(M, col)
@inbounds @views for c in axes(M, 2)
M[:,c] == col && return c
end
return nothing
end
findcol (generic function with 1 method)
julia> col = [2,3,2];
julia> M = [1 1 2 2 3;
1 2 3 4 1;
2 1 2 1 1];
julia> @btime findcol($M, $col)
32.854 ns (3 allocations: 144 bytes)
3
This should probably be fast enough and does not even take into account any ordering.
Upvotes: 2