Rafael Colucci
Rafael Colucci

Reputation: 6078

Yaml load converting string to UTF8?

I have this YAML:

---
test: {"gender":0,"nacionality":"Alem\u00e3o"}

I am reading it using python 3.5 as follow:

with open('teste.yaml', 'r') as stream:
    doc = yaml.load_all(stream)
    for line in doc:
        print(line)

This is the result I get:

{'test': {'gender': 0, 'nacionality': 'Alemão'}}

But If I change " for ' in my YAML, I get this:

{'test': {'nacionality': 'Alem\\u00e3o', 'gender': 0}}

As you can see, when I use " the string Alem\\u00e3o is converted to UTF, but with ' it does not.

So I have two questions:

Why do I get different outputs when I use ' and "?
What can I do to get the output as Alem\\u00e3o when using "?

Upvotes: 0

Views: 4391

Answers (2)

Anthon
Anthon

Reputation: 76568

Backslash escaping in YAML is only available in double quotes scalars. Not in single quoted scalars, unquoted nor (litereral) block scalars.

To get the output as you wish, the best way is to drop the quotes all together and use this as input:

---
test: {gender: 0, nacionality: Alem\u00e3o}

Your program however is up for some improvement.

  1. you should never use load_all() or load() on this kind of non-tagged YAML. That is unsafe and can lead to arbitrary code being executed on your machine if you don't have complete control over the source YAML. Newer versions of ruamel.yaml will throw a warning if you don't explicitly specify the unsafe Loader as an argument. Do yourself a favour and and get into the habit of using safe_load() and safe_load_all().
  2. load_all() gives back an iterator over documents so using doc and line are misleading variable names. You should use:

    import ruamel.yaml as yaml
    
    with open('teste.yaml', 'r') as stream:
        for doc in yaml.safe_load_all(stream):
            print(doc)
    

    or if there is always just one document in teste.yaml you can simplify that to:

    import ruamel.yaml as yaml
    
    with open('teste.yaml') as stream:
        print(yaml.safe_load(stream))
    

    both of which will give you:

    {'test': {'gender': 0, 'nacionality': 'Alem\\u00e3o'}}
    

Please note that it is mandatory in YAML to have a space after the : separating key and value in a mapping. Only for compatibility with JSON is it allowed to drop the space assuming the key is quoted (double and single quotes both work). So this works as input as well:

---
test: {"gender":0, 'nacionality':Alem\u00e3o}

Upvotes: 1

deceze
deceze

Reputation: 521994

That's how the YAML data format is defined. Within double quotes, specific escape sequences are interpreted. Within single quotes, they're not.

7.3.1. Double-Quoted Style

The double-quoted style is specified by surrounding “"” indicators. This is the only style capable of expressing arbitrary strings, by using “\” escape sequences. This comes at the cost of having to escape the “\” and “"” characters.

http://yaml.org/spec/1.2/spec.html#id2787109


What can I do to get the output as Alem\u00e3o when using "?

Escape the escape character:

test: {"gender":0,"nacionality":"Alem\\u00e3o"}

Upvotes: 2

Related Questions