Kim
Kim

Reputation: 75

Julia is not using all of the available CPU threads

I'm doing some calculation in Julia and noticed it's running significantly slower (about 25 times!) than numpy counterpart.

Then I realized Julia is only using 8 threads out of total 96 CPU threads (48 physical cores) on my PC while numpy seems to be having no problem utilzing well over 70 threads.

Running Julia with $julia --thread 96 argument does not make any difference even though julia> Threads.nthreads() returns 96.

Furthermore, a little disapointment from the result is that I suspect Julia using all of the 96 threads still might not be able to matchup with numpy's speed.

Here is the Julia code. I simply measure the time with julia> @time calc_fid(mat_a, mat_b) which gave me 90 seconds average.

using Statistics
using LinearAlgebra

function calc(A::Array{Float32,2}, B::Array{Float32,2})
    μ_A = mean(A, dims=2)
    μ_B = mean(B, dims=2)
    σ_A = cov(A, dims=2)
    σ_B = cov(B, dims=2)

    ssdiff = sum((μ_A - μ_B).^2)
    covmean = sqrt(σ_A * σ_B)
    res = ssdiff + tr(σ_A .+ σ_B .- 2.0 * covmean)
    
    return res
end

Here is the numpy code that takes about 3.5 seconds average. Measured with time.perf_counter()

import numpy as np

from numpy import cov
from numpy import trace
from scipy.linalg import sqrtm

def calc(A, B):
    mu_A = A.mean(axis=0)
    mu_B = B.mean(axis=0)
    sigma_A = cov(A, rowvar=False)
    sigma_B = cov(B, rowvar=False)

    ssdiff = np.sum((mu_A - mu_B) ** 2.0)
    covmean = sqrtm(sigma_A.dot(sigma_B))
    res = ssdiff + trace(sigma_A + sigma_B - 2.0 * covmean)

    return res

Any suggestion/explanation would be greatly appreciated!

Upvotes: 5

Views: 1271

Answers (1)

cbk
cbk

Reputation: 4370

Your Julia example code isn't actually using Julia's multithreading capabilities.

In Julia, if you want multithreading, you generally need to enable it explicitly. Functions like mean or cov will not automatically be multithreaded just by virtue of starting Julia with multiple threads; if you want these functions to multithread, you will need to either write multithreaded versions of them, or else use a package that has already done so.

The only thing in your Julia example code that will multithread at all as currently written would be the linear algebra, because that falls back to BLAS (as indeed likely does numpy), which has its own multithreading system entirely separate from Julia's multithreading (BLAS is written in Fortran). That is also probably not using all 96 threads either though, and in your case is almost certainly defaulting to 8. The number of threads used by BLAS in Julia can be checked with

using LinearAlgebra
BLAS.get_num_threads()

or analogously set with

BLAS.set_num_threads(n)

Upvotes: 6

Related Questions