Manoj Kumar Maharana
Manoj Kumar Maharana

Reputation: 332

How to parse helm chart yaml file using python

I am trying to parse a helm chart YAML file using python. The file contains some curly braces, that's why I am unable to parse the YAML file.

a sample YAML file

apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ .Values.nginx.name }}-config-map
  labels:
    app: {{ .Values.nginx.name }}-config-map
data:
  SERVER_NAME: 12.121.112.12
  CLIENT_MAX_BODY: 500M
  READ_TIME_OUT: '500000'

Basically, I could not figure out how to ignore the values present at right side.

Upvotes: 7

Views: 4912

Answers (4)

ktc
ktc

Reputation: 337

A better option is probably to push your content through helm template first and then parse it. (https://stackoverflow.com/a/63502299/5430476)

Yes, templating makes this task much easier.

However, in my case, I need to adjust YAML nodes across 1 to 200 Helm YAML files in different charts.

For instance, I want to add the following label to all YAML files:

metadata:
  # ...
  labels:
    helm.sh/chart: {{ $.Chart.Name }}-{{ $.Chart.Version | replace "+" "_" }}

Inspired by @ilias-antoniadis's answer, I made some improvements on his option 1:

  1. Handling multiple {{ }} placeholders in a single value, as in the example above.
  2. Processing Helm's flow control syntax (e.g., if or range), which doesn’t include a colon (:) on the same line.
#!/usr/bin/env python
import sys
import re
import yaml


class SimpleGoTemplateHandler:
    def __init__(self):
        self.left = "{{"
        self.right = "}}"
        self.left_placeholder = "__GO_TEMPLATE_LEFT__"
        self.right_placeholder = "__GO_TEMPLATE_RIGHT__"
        self.pipe = "|"
        self.pipe_placeholder = "__THE_PIPE__"
        self.control_counter = 0  # Counter for Helm flow control statements.
        self.control_pattern = re.compile("^(\s*)(\{\{)")  # e.g. "  {{- if .Values.foo -}}"

    def __gen_control_key(self):
        self.control_counter += 1   # prevent duplicated keys
        return f"__CONTROL_{self.control_counter}__"

    def escape(self, content: str) -> str:
        # handle helm control syntax.
        # e.g.
        #   from:
        #     {{- if .Values.foo -}}
        #     {{- end -}}
        #   to:
        #     __CONTROL_1__: "{{- if .Values.foo -}}"
        #     __CONTROL_2__: "{{- end -}}"
        replaced_lines = []
        for line in content.split("\n"):
            pattern_match = self.control_pattern.match(line)
            if pattern_match:
                line = f"{pattern_match.group(1)}{self.__gen_control_key()}: \"{line}\""
            replaced_lines.append(line)

        content = "\n".join(replaced_lines)

        # handle go templates in values
        content = content.replace(f"{self.left}", f"{self.left_placeholder}").replace(f"{self.right}", f"{self.right_placeholder}")

        # handle yaml multiline syntax
        content = content.replace(f"{self.pipe}", f"{self.pipe_placeholder}")
        return content

    def unescape(self, content: str) -> str:
        # undo handle helm control syntax.
        content = re.sub(r"__CONTROL_\d+__: ", "", content)

        # undo handle yaml multiline syntax
        content = content.replace(f"{self.pipe_placeholder}", self.pipe)

        # undo handle go template in values
        return content.replace(f"{self.left_placeholder}", self.left).replace(f"{self.right_placeholder}", self.right)


handler = SimpleGoTemplateHandler()

with open(sys.argv[1]) as f:
    content = f.read()
    yaml_data = yaml.safe_load_all(handler.escape(content))

    # do something with your data

print(handler.unescape(yaml.dump_all(yaml_data, sort_keys=False, width=1000)))

# OR
# with open(sys.argv[1], "w") as f:
#     f.write(handler.unescape(yaml.dump_all(yaml_data, sort_keys=False, width=1000)))

To handle these cases:

  1. First, process Helm's flow control syntax by appending a dummy key to each line for parsing.
  2. Replace all {{ and }} with placeholders to avoid issues with YAML parsers.
  3. Replace all | (pipe characters), since after using pyyaml.dump, these can be split into multiple lines.

Unfortunately, I’m not familiar with Go, so I’m not sure how to use text/template effectively in this case.

It would be great if there were a handy tool to handle scenarios like this.

Upvotes: 0

Ilias Antoniadis
Ilias Antoniadis

Reputation: 21

If prerendering the helmfile is not an option then you have 2 options:

  1. Escape the curly brackets before parsing the chart, process it and then bring back the curly brackets like this:
def escape_go_templates(content: str) -> str:
    return re.sub(r"\s{{(.*?)}}", r" __GO_TEMPLATE__\1__GO_TEMPLATE__", content)
    
def unescape_go_templates(content: str) -> str:
    return re.sub(r"__GO_TEMPLATE__(.+?)__GO_TEMPLATE__", r"{{\1}}", content)
  1. Utilize ruamel.yaml and jinja2 templates

Construct your representer that will ignore the curly brackets when loading the yaml file:

def represent_str(representer: SafeRepresenter, data: str | None) -> ScalarNode:
    if data and data.startswith("{{"):
        return representer.represent_scalar("tag:yaml.org,2002:str", data, style="-")

    return representer.represent_str(data)

init your yaml instance:

from ruamel.yaml import YAML

yaml = YAML(typ="jinja2")
yaml.width = 4096 # This is because lines in charts can get very long.
yaml.representer.add_representer(str, _represent_str)

this will allow to parse helm charts with python you can find more details in my blogpost

Upvotes: 0

Manoj Kumar Maharana
Manoj Kumar Maharana

Reputation: 332

I solved it by wrapping quotes wherever I use the template.

like this

apiVersion: v1
kind: ConfigMap
metadata:
    name: "{{ .Values.nginx.name }}-config-map"
    labels:
         app: "{{ .Values.nginx.name }}-config-map"
data:
    SERVER_NAME: 12.121.112.12
    CLIENT_MAX_BODY: 500M
    READ_TIME_OUT: '500000'

Helm can read this and I can parse this using python YAML as it's a valid YAML file.

Upvotes: 0

coderanger
coderanger

Reputation: 54249

You would have to write an implementation of Go's text/template library in Python. A better option is probably to push your content through helm template first and then parse it.

Upvotes: 6

Related Questions