AlirezaGH
AlirezaGH

Reputation: 399

How to parallel a simple for loop in julia language?

I have already written a serial code for solving a Laplace equation, but when I tried to write it in parallel in Julia, it takes more time and memory than the serial one. I wrote a simple example of it. How can I parallel this code?

There is a domain t1.

t2 will be calculated and then t1 = t2

@everywhere function left!(t1,t2,n,l_type,b_left,dx=1.0,k=50.0)
    if l_type==1
            for i=1:n
                t2[i,1]=(b_left*dx/k)+t1[i,2];
                t1[i,1]=t2[i,1];
            end
    else 
        for i=1:n
        t1[i,1]=b_left;
        end
    end
    return t1 end

# parallel left does not work.
@everywhere function pleft!(t1,t2,n,l_type,b_left,dx=1.0,k=50.0)
    if l_type==1
           @parallel for i=1:n
                t2[i,1]=(b_left*dx/k)+t1[i,2];
                t1[i,1]=t2[i,1];
            end
    else 
     @parallel for i=1:n
        t1[i,1]=b_left;
        end
    end
    return t1
end
n = 10;
t1 = SharedArray(Float64,(n,n));
t2=t1;
typ = 0;
value = 10;
dx = 1;
k=50;

@time t3 = pleft!(t1,t2,n,typ,value,dx,k)
@time t2 = left!(t1,t2,n,typ,value,dx,k)

the answer is :

0.000872 seconds (665 allocations: 21.328 KB) # for parallel one 
0.000004 seconds (4 allocations: 160 bytes)   #for usual one

how can I fix this?

after calculating that I should calculate below in a while loop. I need to parallel below code to.

@everywhere function oneStepseri(t1,N)
    t2 = t1;
     for j = 2:(N-1) 
        for i = 2:(N-1)
         t2[i,j]=0.25*(t1[i-1,j]+t1[i+1,j]+t1[i,j-1]+t1[i,j+1]);
        end
               end
    return t2;
end

thanks...

Upvotes: 5

Views: 2669

Answers (1)

AlirezaGH
AlirezaGH

Reputation: 399

I tried many things. @parallel with SharedArray, Distributed Array, domain-dividing and using @spawn. there was no speedup. but recently Julia added "Threads" you can add Threads by export JULIA_NUM_THREADS=4 in the command windows. by using Threads.@threads you can parallel your code. check the number of threads by Threads.nthreads() here is my code and it gives me a good speedup.

#to add threads export JULIA_NUM_THREADS=4

nth = Threads.nthreads(); #print number of threads

println(nth);

a = zeros(10);

Threads.@threads for i = 1:10
           a[i] = Threads.threadid()
       end

show(a)

b = zeros(100000);
c = zeros(100000);
b[1] = b[end] = 1;
c[1] = c[end] = 1;

function noth(A)
    B = A;
    for i=2:(length(A)-1)
        B[i] = (A[i-1] + A[i+1])*0.5;
    end 
    return B
end

function th(A)
    B = A;
    Threads.@threads for i=2:(length(A)-1)
        B[i] = (A[i-1] + A[i+1])*0.5;
    end
    return B
end


println("warmup noth , th")
@time bb = noth(b)
@time cc = th(c)
println("end ")
@time bb = noth(b)
@time cc = th(c)

@time bb = noth(b)
@time cc = th(c)

@time bb = noth(b)
@time cc = th(c)
@time bb = noth(b)
@time cc = th(c)
@time bb = noth(b)
@time cc = th(c)
@time bb = noth(b)
@time cc = th(c)
show(bb[10])
println("\nbb ------------------------------------------------------------------------------------------------------------------> cc")
show(cc[10])

the answer is like this

5                                                                                                                                                                    
[1.0,1.0,2.0,2.0,3.0,3.0,4.0,4.0,5.0,5.0]warmup noth , th                                                                                                            
  0.008661 seconds (2.53 k allocations: 113.180 KB)                                                                                                                  
  0.020738 seconds (7.94 k allocations: 336.981 KB)                                                                                                                  
end                                                                                                                                                                  
  0.000446 seconds (4 allocations: 160 bytes)                                                                                                                        
  0.000122 seconds (6 allocations: 224 bytes)                                                                                                                        
  0.000437 seconds (4 allocations: 160 bytes)                                                                                                                        
  0.000135 seconds (6 allocations: 224 bytes)                                                                                                                        
  0.000435 seconds (4 allocations: 160 bytes)                                                                                                                        
  0.000115 seconds (6 allocations: 224 bytes)                                                                                                                        
  0.000447 seconds (4 allocations: 160 bytes)                                                                                                                        
  0.000112 seconds (6 allocations: 224 bytes)                                                                                                                        
  0.000440 seconds (4 allocations: 160 bytes)                                                                                                                        
  0.000109 seconds (6 allocations: 224 bytes)                                                                                                                        
  0.000439 seconds (4 allocations: 160 bytes)                                                                                                                        
  0.000116 seconds (6 allocations: 224 bytes)                                                                                                                        
0.052478790283203125                                                                                                                                                 
bb ------------------------------------------------------------------------------------------------------------------> cc                                            
0.052478790283203125juser@juliabox:~/threads$                                                                                                                        

for 5 threads and 100000 nodes.

note that for warmup there is no speed up. but after that there is speedup.

0.000446 seconds (4 allocations: 160 bytes)   # usual code run                    
0.000122 seconds (6 allocations: 224 bytes)   #parallel code run                           

Upvotes: 3

Related Questions