Saving data in parallel in julia

Question

I am confronted with a problem when submitting many jobs to a cluster, where each job is calculating some data and saving it (with many variables in terms of a .jld file) to some drive, for example like this

function f(savedir, pid, params)
    ...
    save(savedir*"$(pid).jld",result)
end

After the calculation I need to process the data and load each .jld file to access the variables individually. Even though the final reduction is rather small, this takes a lot of time. I though about saving it all to one .jld file, but in that case I run into the problem that the file is potentially accessed at the same time, since the jobs run in parallel. Further I though about collecting the data in a out-of-core fashion using juliaDB, but in the end I do not see why this should be any better. I know that this could be solved with some database server, but this seems to be an overkill for my problem. How do you deal with this kind of problems?

Best,

v.

Saving data in parallel in julia

Answers (1)

Related Questions