Yifei
Yifei

Reputation: 103

How to exectute Julia code in command line?

I have been recently transfer my code in Julia. I'm wondering how to exectute Julia code in command line?

I know the Julia code can be complied by running it once.

But the thing is I need to do parameter sweep for my simulation models on the cluster, where I could only use command line -- not the REPL.

What is the best practice to run simulation replications on the cluster?

Upvotes: 4

Views: 8205

Answers (5)

Chris Rackauckas
Chris Rackauckas

Reputation: 19132

Just call your script using the command line:

julia myscript.jl

But the thing is I need to do parameter sweep for my simulation models on the cluster, where I could only use command line.

I think it's easiest to use Julia's built-in parallelism. pmap usually does the trick. If you're solving differential equations, DifferentialEquations.jl has a function which will parallelize your problem across a cluster, and its internal implementation uses pmap. That can serve as a good reference for how to handle other problems as well.

Then all you have to do is call Julia so that way it has access to all cores. You can easily do this by passing in the machinefile:

julia myscript.jl --machinefile the_machine_file

The machine file is generated whenever you create a batch job (for some clusters, sometimes you need to enable MPI for the machine file to show up). For more information, see this blog post.

Upvotes: 6

Przemyslaw Szufel
Przemyslaw Szufel

Reputation: 42214

Please find below the best practices to run parameter sweep on a Julia HPC cluster. I discuss three issues: (1) computational simulation architecture (2) cluster setup (3) pre-compilation.

  1. Planning simulation architecture, in the first step consider the variance of computation time for each sweep value

    • if the variance of computation time is low you are OK with the suggested pmap. Another good alternative is the @parallel loop.
    • however, if the variance of computational time is high, using those options is not recommended. pmap and @parallel simply divide the tasks equally across all the workers. Hence, the execution time would be the time the longest worker took to complete all jobs that it has been assigned.

    Hence, for heterogeneous computation times you need:

    • store job number (or parameter sweep values) on the master process
    • launch loops on slave processes using @spawnat (simply iterate over workers())
    • have slave processes poll the master for the next parameter sweep value using ParallelDataTransfer.jl (of course some external database can be used for this purpose instead).
  2. On HPC environments the best choice for cluster setup is ClusterManagers.jl - works like a charm and the PBS that you mentioned is supported. This library will execute appropriate PBS cluster manager commands to add nodes to your Julia cluster. Simple, efficient and easy to use. The suggested by others --machinefile option is very convenient but requires a passwordless SSH that is usually not available (or not easily configurable) on most HPC clusters (unless it is a cluster in public cloud - for AWS or Azure I would definitely recommend --machinefile).

    Please note that in some HPC clusters (e.g. Cray) you might need to build Julia separately for the access and worker nodes due to different hardware architectures. Fortunately, Julia parallelization works without any problems in heterogeneous environments.

    Last but not least, you may always use your cluster manager to run separate Julia processes (grid computing/array computing job). This however, becomes complicated if computation times are heterogeneous (see the comments previous point).

  3. I would not recommend pre-compiling. In most numerical simulation scenarios a single process will run anywhere between 10 minutes and a few days. Reducing this by 10-20 seconds of compilation time is not worth the effort. However, the instructions are below:

    The steps include:

    1. Create yourimage.jl file with content such as Base.require(:MyModule1) Base.require(:MyModule2)
    2. Run $ julia /path/to/julia/Julia/share/julia/build_sysimg.jl /target/image/folder native yourimage.jl
    3. Wait for message similar to this one INFO: System image successfully built at /target/image/folder/yourimage.so INFO: To run Julia with this image loaded, run: `julia -J /target/image/folder/yourimage.so`.
    4. Follow the instructions and run Julia with the -J option

    You need to repeat the above four steps every time something in your own or external packages changes.

Upvotes: 0

Yifei
Yifei

Reputation: 103

Forgot to mention that I've managed to run Julia from command line on the cluster.

In the PBS job script, you can add julia run_mytest.jl $parameter. In the run_mytest.jl, you can add

include("mytest.jl")
arg = parse(Float64, ARGS[1])
mytest(arg)

Upvotes: 3

vonDonnerstein
vonDonnerstein

Reputation: 103

Assuming what you are trying to achieve is the following:

  • Have one .jl-File containing the code and a shebang (#!/usr/bin/env julia, or similar) at the top.
  • Have another program, bash, etc. call upon this code (e.g. in bash by calling ./mycode.jl).
  • But avoid going through the compilation step for each time the code is called because it creates significant overhead.

Answer:

As others have pointed out, I would think the most julia-nique way of doing this would actually be to do the looping over parameters/distribution of workloads/etc. all within julia. But if you want to do it as described above you can use the following little trick:

  • Extract all code that has to be compiled into a module.
  • The actual file to be called thus reduces to

#!/usr/bin/env julia

using mymodule

mymainfunction(ARGS)

  • make the module precompiled by adding __precompile__() to the module file (see the Julia Manpages for more on this)

This way, after having called the code once per machine, precompiled objects are available, reducing the aforementioned overhead effectively to zero.

Upvotes: 0

Kevin L. Keys
Kevin L. Keys

Reputation: 995

Julia uses JIT compilation independent of whether or not you execute Julia at the command line or in the REPL or on a compute cluster.

Is it problematic to run your code once to compile and once more for performance? You can always compile your code using a tiny model or dataset and then run the compiled code on your complete dataset.

If you run on one node, then you can write a function (e.g. my_sim()) containing all of your execution code, and then run your replications in serial as one scheduled job. The first call to my_sim() compiles all of your code, and the subsequent calls run faster.

If you run on multiple nodes, then carefully consider how to distribute jobs; perhaps you can test your parameter settings in groups, and assign each group to its own node, and then do my_sim() on each node.

Upvotes: 1

Related Questions