Reputation: 342
I am very new to the Julia programming language and I am testing out some Euclidean distance operations I typically perform in other languages. The functions work if calling it serially, but the pmap calls are not returning the desired results. Could someone take a look and let me know if I am going about this the right way? is pmap even the best way to approach this?
using Distributed
#Example data
d1 = randn(50000,3)
d2 = randn(50000,3)
First Function: Euclidean Distance Matrix
function EDM(m1, m2)
n1 = size(m1, 1)
n2 = size(m2,1)
k = size(m1, 2)
Dist = zeros(n1,n2)
for i in 1:n1
for j in 1:n2
dtemp = 0
for a in 1:k
dtemp += (m1[i,a] - m2[j,a]) ^ 2
end
Dist[i,j] = sqrt(dtemp)
end
end
return Dist
end
#pmap call
function pmap_EDM(m1,m2)
return pmap(EDM, m1, m2)
end
Second Function: Minimum Euclidean Distances Unidirectional
function MED(m1, m2)
n1 = size(m1, 1)
n2 = size(m2,1)
k = size(m1, 2)
Dist = zeros(n1,1)
for i in 1:n1
dsum = Inf
for j in 1:n2
dtemp = 0
for a in 1:k
dtemp += (m1[i,a] - m2[j,a]) ^ 2
end
dtemp = sqrt(dtemp)
if dtemp < dsum
dsum = copy(dtemp)
end
end
Dist[i,1] = dsum
end
return Dist
end
#pmap call
function pmap_MED(m1,m2)
return pmap(MED, m1, m2)
end
Third Function: Minimum Euclidean Distances and Corresponding Indices Unidirectional
function MEDI(m1, m2)
n1 = size(m1, 1)
n2 = size(m2,1)
k = size(m1, 2)
Dist = zeros(n1,2)
for i in 1:n1
dsum = Inf
dsum_ind = 0
for j in 1:n2
dtemp = 0
for a in 1:k
dtemp += (m1[i,a] - m2[j,a]) ^ 2
end
dtemp = sqrt(dtemp)
if dtemp < dsum
dsum = copy(dtemp)
dsum_ind = copy(j)
end
end
Dist[i,1] = dsum
Dist[i,2] = dsum_ind
end
return Dist
end
#pmap call
function pmap_MEDI(m1,m2)
return pmap(MEDI, m1, m2)
end
Calling functions
r1 = EDM(d1,d2) #serial
r2 = pmap_EDM(d1,d2)
r3 = MED(d1,d2) #serial
r4 = pmap_MED(d1,d2)
r5 = MEDI(d1,d2) #serial
r6 = pmap_MEDI(d1,d2)
The first function should return a simple Euclidean distance matrix with the distances between each row in one array to every row in the second array. The second and third functions are deviations of this to return a subset of those distances based on the minimum distance for each row in one array to every other row in another array (with the third function returning the index position of the minimum distance). The distances do not appear to be calculated correctly and the latter two functions using pmap are returning an nx3 matrix rather than nx1 and nx2 respectively.
d1 = randn(5,3)
d2 = randn(5,3)
julia> EDM(d1,d2)
5×5 Array{Float64,2}:
2.60637 3.18867 1.0745 2.60328 1.58608
1.2763 2.31037 3.04379 2.74113 2.00452
1.70024 2.07731 3.12397 2.60893 2.05932
2.44581 1.57345 0.910323 1.08718 0.407675
3.42936 1.13001 2.18345 1.08764 1.70883
julia> pmap_EDM(d1,d2)
5×3 Array{Array{Float64,2},2}:
[0.397928] [2.39283] [0.953501]
[1.06776] [0.815057] [1.87973]
[0.151963] [3.05161] [0.650967]
[0.571021] [0.275554] [0.883151]
[0.109293] [0.635398] [1.58254]
julia> MED(d1,d2)
5×1 Array{Float64,2}:
1.0744953977891307
1.2762979313081781
1.7002448697495505
0.40767454400155695
1.0876399289364607
julia> pmap_MED(d1,d2)
5×3 Array{Array{Float64,2},2}:
[0.397928] [2.39283] [0.953501]
[1.06776] [0.815057] [1.87973]
[0.151963] [3.05161] [0.650967]
[0.571021] [0.275554] [0.883151]
[0.109293] [0.635398] [1.58254]
julia> MEDI(d1,d2)
5×2 Array{Float64,2}:
1.0745 3.0
1.2763 1.0
1.70024 1.0
0.407675 5.0
1.08764 4.0
julia> pmap_MEDI(d1,d2)
5×3 Array{Array{Float64,2},2}:
[0.397928 1.0] [2.39283 1.0] [0.953501 1.0]
[1.06776 1.0] [0.815057 1.0] [1.87973 1.0]
[0.151963 1.0] [3.05161 1.0] [0.650967 1.0]
[0.571021 1.0] [0.275554 1.0] [0.883151 1.0]
[0.109293 1.0] [0.635398 1.0] [1.58254 1.0]
using Distributed
using SharedArrays
#Minimum Euclidean Distances Unidirectional
@everywhere function MD(v1, m2)
n = size(m2, 1)
dsum = Inf
for j in 1:n
dtemp = sqrt((v1[1] - m2[j,1]) ^ 2 + (v1[2] - m2[j,2]) ^ 2 + (v1[3] - m2[j,3]) ^ 2)
if dtemp < dsum
dsum = dtemp
end
end
return dsum
end
function MED(m1, m2)
n1 = size(m1,1)
Dist = SharedArray{Float64}(n1)
m3 = SharedArray{Float64}(m2)
@sync @distributed for k in 1:n1
Dist[k] = MD(m1[k,:], m3)
end
return Dist
end
Upvotes: 1
Views: 96
Reputation: 1905
I did not went into the details of your code, but could it be that you apply pmap
at the wrong code level?
For instance if you have the following serial code
for i = 1:imax
# do some work
end
You would write this as:
function function_for_single_iteration(i)
# do some work
end
pmap(function_for_single_iteration,1:imax)
Essentially the pmap replaces a (outer) for loop.
Before using pmap
, I usually first use the serial map
function to check that I have the same results.
Note that pmap
and map
would return a vector. In your case, probably a vector of vectors of distances. You would need to use cat
to turn this into a matrix.
Upvotes: 1