Shayan
Shayan

Reputation: 6295

How to calculate the product of each pair columns of a matrix efficiently in Julia?

I want to calculate the product of each pair of columns of a Matrix:

 ┌─     ─┐         ┌─┐ ┌──┐ ┌──┐
 │ 1 3 4 │ ────►   │3│ │4 │ │12│
 │ 2 1 5 │         │2│ │10│ │5 │
 └─     ─┘         └─┘ └──┘ └──┘

I tried:

julia> using Combinatorics

julia> prods = collect(
         Iterators.map(
           idx->arr[:, idx[1]] .* arr[:, idx[2]],
           combinations(axes(arr, 2), 2)
         )
       )
3-element Vector{Vector{Int64}}:
 [3, 2]
 [4, 10]
 [12, 5]

Since the compiler each time should compile the anonymous function, and I make copies 6 times and collect the result, this isn't efficient to me. What is the best way?

Upvotes: 0

Views: 91

Answers (3)

AboAmmar
AboAmmar

Reputation: 5559

A classical loop would be 5X faster than combinatorics.

function prod_col(arr)
    m, n = size(arr)
    out = similar(arr, m, n*(n-1)÷2)
    c = 1
    for i = 1:n, j = i+1:n
        for k = 1:m
            out[k,c] = arr[k,i] * arr[k,j]
        end
        c += 1
    end
    out
end

Testing with a 10-by-10 matrix gives:

arr = rand(0:9,10,10)
@benchmark prod_col($arr)
@benchmark prod_cols_pairs($arr)

BenchmarkTools.Trial: 10000 samples with 195 evaluations.
 Range (min … max):  526.667 ns … 48.783 μs  ┊ GC (min … max):  0.00% … 97.62%
 Time  (median):       1.969 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):     2.328 μs ±  3.950 μs  ┊ GC (mean ± σ):  21.19% ± 11.86%

  ▅▆█▄▂▂                                                       ▁
  ██████▄▃▁▁▁▁▁▁▁▁▄▅▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▄▆█ █
  527 ns        Histogram: log(frequency) by time      33.4 μs <

 Memory estimate: 3.69 KiB, allocs estimate: 1.
BenchmarkTools.Trial: 10000 samples with 3 evaluations.
 Range (min … max):   7.967 μs …  1.640 ms  ┊ GC (min … max):  0.00% … 98.25%
 Time  (median):      8.767 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   10.902 μs ± 39.061 μs  ┊ GC (mean ± σ):  13.31% ±  3.74%

  ▅█▆▄▃▄▄▂                                                    ▂
  ██████████████▇▆▅▆▆▅▄▄▄▄▄▄▁▃▄▄▁▁▁▄▁▁▃▃▄▃▃▁▁▃▁▁▁▁▄▃▁▁▁▁▁▁▃▃▃ █
  7.97 μs      Histogram: log(frequency) by time      33.7 μs <

 Memory estimate: 22.88 KiB, allocs estimate: 191.

Upvotes: 1

Andre Wildberg
Andre Wildberg

Reputation: 19088

Try this base julia

julia> col = size(a, 2);

julia> [a[:,i] .* a[:,j] for i in 1:col for j in (i+1):col]
3-element Vector{Vector{Int64}}:
 [3, 2]
 [4, 10]
 [12, 5]

If you want a matrix try reduce

julia> reduce(hcat, [a[:,i] .* a[:,j] for i in 1:col for j in (i+1):col])
2×3 Matrix{Int64}:
 3   4  12
 2  10   5

Data

julia> a = [[1, 2] [3, 1] [4, 5]];

Upvotes: 1

Shayan
Shayan

Reputation: 6295

I think the following is a better way:

julia> using Combinatorics

julia> function prod_cols_pairs(mat::Matrix{T})::Matrix{T} where T
         idxs = combinations(axes(mat, 2), 2)
         arr[:, getindex.(idxs, 1)] .* arr[:, getindex.(idxs, 2)]
       end;

julia> prod_cols_pairs(arr)
2×3 Matrix{Int64}:
 3   4  12
 2  10   5

# Another example
julia> arr = [
         1 3 4 2
         2 1 5 2
       ]
2×4 Matrix{Int64}:
 1  3  4  2
 2  1  5  2

julia> prod_cols_pairs(arr)
2×6 Matrix{Int64}:
 3   4  2  12  6   8
 2  10  4   5  2  10
  1. I avoided collecting.
  2. I dropped defining an anonymous function.
  3. I decreased the number of copies from 6 to 2. I think this is a good improvement since if my input had 6 columns, this leads to decreasing the number of copies from 38 down to 2 (imagine if I had 50 columns matrix...). It can decline to zero if I use @views. However, I don't know how the performance would be in the case of large Matrixes if I incorporate it.

Please correct me if I'm wrong.

Upvotes: 0

Related Questions