Reputation: 8592
I have a main config file, let's say config.yaml
:
num_layers: 4
embedding_size: 512
learning_rate: 0.2
max_steps: 200000
I'd like to be able to override this, on the command-line, with another file, like say big_model.yaml
, which I'd use conceptually like:
python my_script.py --override big_model.yaml
and big_model.yaml
might look like:
num_layers: 8
embedding_size: 1024
I'd like to be able to override with an arbitrary number of such files, each one taking priority over the last. Let's say I also have fast_learn.yaml
learning_rate: 2.0
And so I'd then want to conceptually do something like:
python my_script.py --override big_model.yaml --override fast_learn.yaml
What is the easiest/most standard way to do this in hydra? (or potentially in omegaconf perhaps?)
(note that I'd like these override files to ideally just be standard yaml files, that override the earlier yaml files, ideally; though if I have to write using override DSL instead, I can do that, if that's the easiest/best/most standard way)
Upvotes: 8
Views: 15414
Reputation: 1036
Omry's (the library author) answer is correct and very concise. This answer expands on his by directly answering your scenario.
First, we have the following file structure:
my_app.py
conf/
config.yaml
variants/
size/
big_model.yaml
train/
fast_learn.yaml
my_app.py
:
from omegaconf import DictConfig, OmegaConf
import hydra
@hydra.main(version_base=None, config_path="conf", config_name="config")
def my_app(cfg: DictConfig) -> None:
print(OmegaConf.to_yaml(cfg))
if __name__ == "__main__":
my_app()
conf/config.yaml
:
num_layers: 4
embedding_size: 512
learning_rate: 0.2
max_steps: 200000
conf/variants/size/big_model.yaml
:
# @package _global_
num_layers: 8
embedding_size: 1024
conf/variants/train/fast_learn.yaml
:
# @package _global_
learning_rate: 2.0
You need to now run the following (previously explained by Omry):
python my_app.py +variants/size=big_model +variants/train=fast_learn
The following is the output:
num_layers: 8
embedding_size: 1024
learning_rate: 2.0
max_steps: 200000
The difference with Omry's answer is that we are using nested yaml files instead of having all the changes in the same experiment/exp1.yaml
.
Instead, you could be more concise:
conf/
config.yaml
big_model/
c.yaml
fast_learn/
c.yaml
Then, you would only need to write:
python my_app.py +big_model=c +fast_learn=c
Although it is more concise, I think the previous one should be preferred as it is more explicit.
The order of execution is first to last yaml files specified.
Let's say we modify fast_learn.yaml
to the following:
# @package _global_
learning_rate: 2.0
num_layers: 20000
num_layers
is used in both fast_learn.yaml
and big_model.yaml
. Thus, one will override the other. If you run:
python my_app.py +variants/size=big_model +variants/train=fast_learn
The output will be:
num_layers: 20000
embedding_size: 1024
learning_rate: 2.0
max_steps: 200000
If you run:
python my_app.py +variants/train=fast_learn +variants/size=big_model
You will get:
num_layers: 8
embedding_size: 1024
learning_rate: 2.0
max_steps: 200000
Upvotes: 1
Reputation: 33646
Refer to the basic tutorial and read about config groups.
You can create arbitrary config groups, and select one option from each (As of Hydra 1.0, config groups options are mutually exclusive), you will need two config groups here:
one can be model
, with a normal
, small
and big
model, and another can trainer
, with maybe normal
and fast
options.
Config groups can also override things in other config groups. You can also always append to the defaults list from the command line - so you can also add additional config groups that are only used in the command line. an example for that can an 'experiment' config group. You can use it as:
$ python train.py +experiment=exp1
In such config groups that are overriding things across the entire config you should use the global package (read more about packages in the docs).
# @package _global_
num_layers: 8
embedding_size: 1024
learning_rate: 2.0
Upvotes: 4
Reputation: 184
It sounds like package override might be the a good solution for you.
The documentation can be found here: https://hydra.cc/docs/next/advanced/overriding_packages
an example application can be found here: https://github.com/facebookresearch/hydra/tree/master/examples/advanced/package_overrides
using the example application as an example, you can achieve the override by doing something like
$ python simple.py db=postgresql db.pass=helloworld
db:
driver: postgresql
user: postgre_user
pass: helloworld
timeout: 10
Upvotes: 1