Pmap with Euclidean Distance Operations

Question

I am very new to the Julia programming language and I am testing out some Euclidean distance operations I typically perform in other languages. The functions work if calling it serially, but the pmap calls are not returning the desired results. Could someone take a look and let me know if I am going about this the right way? is pmap even the best way to approach this?

using Distributed

#Example data
d1 = randn(50000,3) 
d2 = randn(50000,3)

First Function: Euclidean Distance Matrix

function EDM(m1, m2)
    n1 = size(m1, 1)
    n2 = size(m2,1)
    k = size(m1, 2)
    Dist = zeros(n1,n2)
    for i in 1:n1
        for j in 1:n2
            dtemp = 0
            for a in 1:k
                dtemp += (m1[i,a] - m2[j,a]) ^ 2
            end
            Dist[i,j] = sqrt(dtemp)
        end
    end
    return Dist
end

#pmap call
function pmap_EDM(m1,m2)
    return pmap(EDM, m1, m2)
end

Second Function: Minimum Euclidean Distances Unidirectional

function MED(m1, m2)
    n1 = size(m1, 1)
    n2 = size(m2,1)
    k = size(m1, 2)
    Dist = zeros(n1,1)
    for i in 1:n1
        dsum = Inf
        for j in 1:n2
            dtemp = 0
            for a in 1:k
                dtemp += (m1[i,a] - m2[j,a]) ^ 2
            end
            dtemp = sqrt(dtemp)
            if dtemp < dsum
                dsum = copy(dtemp)
            end
        end
        Dist[i,1] = dsum
    end
    return Dist
end

#pmap call
function pmap_MED(m1,m2)
    return pmap(MED, m1, m2)
end

Third Function: Minimum Euclidean Distances and Corresponding Indices Unidirectional

function MEDI(m1, m2)
    n1 = size(m1, 1)
    n2 = size(m2,1)
    k = size(m1, 2)
    Dist = zeros(n1,2)
    for i in 1:n1
        dsum = Inf
        dsum_ind = 0
        for j in 1:n2
            dtemp = 0
            for a in 1:k
                dtemp += (m1[i,a] - m2[j,a]) ^ 2
            end
            dtemp = sqrt(dtemp)
            if dtemp < dsum
                dsum = copy(dtemp)
                dsum_ind = copy(j)
            end
        end
        Dist[i,1] = dsum
        Dist[i,2] = dsum_ind
    end
    return Dist
end

#pmap call
function pmap_MEDI(m1,m2)
    return pmap(MEDI, m1, m2)
end

Calling functions

r1 = EDM(d1,d2) #serial
r2 = pmap_EDM(d1,d2)

r3 = MED(d1,d2) #serial
r4 = pmap_MED(d1,d2)

r5 = MEDI(d1,d2) #serial
r6 = pmap_MEDI(d1,d2)

Edited:

The first function should return a simple Euclidean distance matrix with the distances between each row in one array to every row in the second array. The second and third functions are deviations of this to return a subset of those distances based on the minimum distance for each row in one array to every other row in another array (with the third function returning the index position of the minimum distance). The distances do not appear to be calculated correctly and the latter two functions using pmap are returning an nx3 matrix rather than nx1 and nx2 respectively.

Edited 2: example using smaller data set to show results

d1 = randn(5,3) 
d2 = randn(5,3) 

julia> EDM(d1,d2)
5×5 Array{Float64,2}:
 2.60637  3.18867  1.0745    2.60328  1.58608 
 1.2763   2.31037  3.04379   2.74113  2.00452 
 1.70024  2.07731  3.12397   2.60893  2.05932 
 2.44581  1.57345  0.910323  1.08718  0.407675
 3.42936  1.13001  2.18345   1.08764  1.70883 

julia> pmap_EDM(d1,d2)
5×3 Array{Array{Float64,2},2}:
 [0.397928]  [2.39283]   [0.953501]
 [1.06776]   [0.815057]  [1.87973] 
 [0.151963]  [3.05161]   [0.650967]
 [0.571021]  [0.275554]  [0.883151]
 [0.109293]  [0.635398]  [1.58254] 

julia> MED(d1,d2)
5×1 Array{Float64,2}:
 1.0744953977891307 
 1.2762979313081781 
 1.7002448697495505 
 0.40767454400155695
 1.0876399289364607 

julia> pmap_MED(d1,d2)
5×3 Array{Array{Float64,2},2}:
 [0.397928]  [2.39283]   [0.953501]
 [1.06776]   [0.815057]  [1.87973] 
 [0.151963]  [3.05161]   [0.650967]
 [0.571021]  [0.275554]  [0.883151]
 [0.109293]  [0.635398]  [1.58254] 

julia> MEDI(d1,d2)
5×2 Array{Float64,2}:
 1.0745    3.0
 1.2763    1.0
 1.70024   1.0
 0.407675  5.0
 1.08764   4.0

julia> pmap_MEDI(d1,d2)
5×3 Array{Array{Float64,2},2}:
 [0.397928 1.0]  [2.39283 1.0]   [0.953501 1.0]
 [1.06776 1.0]   [0.815057 1.0]  [1.87973 1.0] 
 [0.151963 1.0]  [3.05161 1.0]   [0.650967 1.0]
 [0.571021 1.0]  [0.275554 1.0]  [0.883151 1.0]
 [0.109293 1.0]  [0.635398 1.0]  [1.58254 1.0]

Edited 3: @distributed version of function two

using Distributed
using SharedArrays

#Minimum Euclidean Distances Unidirectional
@everywhere function MD(v1, m2)
    n = size(m2, 1)
    dsum = Inf
    for j in 1:n
        dtemp = sqrt((v1[1] - m2[j,1]) ^ 2 + (v1[2] - m2[j,2]) ^ 2 + (v1[3] - m2[j,3]) ^ 2)
        if dtemp < dsum
            dsum = dtemp
        end
    end
    return dsum
end

function MED(m1, m2)
    n1 = size(m1,1)
    Dist = SharedArray{Float64}(n1)
    m3 = SharedArray{Float64}(m2)
    @sync @distributed for k in 1:n1
        Dist[k] = MD(m1[k,:], m3)
    end
    return Dist
end

Pmap with Euclidean Distance Operations

Edited:

Edited 2: example using smaller data set to show results

Edited 3: @distributed version of function two

Answers (1)

Related Questions