Localpunter
Localpunter

Reputation: 31

Storing parameters in .yaml file - how to escape?

Does anyone have knowledge on escaping characters in Yaml?

I am currently creating pipelines with the StreamSets SDK for Python and now introducing Hydra to store the configs in .yaml files to allow us to tweak or add certain params with Compose and Overrides without changing the config file itself.

One in particular is causing me some issues as I cannot work out how to escape it properly.

In the Cluster tab, spark.home has the value ${runtime:conf('spark_home')} but, as it is, this will error.

The closest I can get without an error is with $\{runtime:conf('spark_home')} (unquoted) which outputs $\\{runtime:conf('spark_home')}.

I have read through various docs on this but none cover having curly braces, brackets & a quoted string within the parameter and I have tried, what feels like, hundreds of combinations.

I have wrapped the whole value and spark_home in single and double quotes, unquoted with special/illegal characters escaped and deconstructed the value to build it bit by bit to pin point where it errors.

The main issue, I think, is the quoted string 'spark_home'. If I remove the single quotes I have no issues. As a side note 'spark_home' can be either single or double quotes but cannot be removed.

UPDATE Hi, thanks for the answers so far. I didn't want to overload the initial question. I am at the initial stages of looking at this and testing if yaml will provide a cleaner solution. As stated I want to be able to increase the memory & drivers for some pipelines so using initialize & compose so I can override certain parameters but here is a simplified version of my setup. On main.py I have:

initialize(config_path="conf", job_name="test_pipeline")
cfg = compose(config_name="config", overrides=[])

And in my config.yaml file:

defaults:
  - streamsets: dev

cluster:
  spark.driver.memory: 8G
  spark.driver.cores: 2
  spark.executor.memory: 8G
  spark.executor.cores: 2
  spark.home: "${runtime:conf('spark_home')}"

If I print(cfg) with the current set up I get a lengthy error all from python3.7/site-packages/... (not sure where I can post this) but the end part has:

    raise GrammarParseError(str(e) if msg is None else msg) from e
hydra.errors.ConfigCompositionException

If I add a backslash before the opening curly brace and remove the double quotes - spark.home: $\{runtime:conf('spark_home')} it prints but I get a double escape in the spark.home parameter. Again I have tried a lot of combinations and read various docs which don't cover this complexity

{'spark.driver.memory': '8G', 'spark.driver.cores': 2, 'spark.executor.memory': '8G', 'spark.executor.cores': 2, 'spark.home': "$\\{runtime:conf('spark_home')}"}

I am using Pycharm (which is what my company uses for Python) and it doesn't highlight any issue when I create the yaml file. I have also re-created a simple version on my own machine using VS Code with the YAML extension and this doesn't highlight any issue either. Thanks

Upvotes: 2

Views: 578

Answers (0)

Related Questions