chaimp
chaimp

Reputation: 17907

Printing non-ascii characters in python/jinja

The following code works correctly:

from jinja2 import Template

mylist = ['some text \xc3']

template = Template('{{ list }}')

print template.render(list=mylist)

When I run it, it outputs:

['some text \xc3']

Yet, when I try to print the actual list element, it fails:

template = Template('{{ list[0] }}')

print template.render(list=mylist)

The error is:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)

I would like to find a way to print the individual list element in the same way that the whole list is printed, where the non-ascii character is represented with the \x notation.

Upvotes: 3

Views: 9506

Answers (4)

jla
jla

Reputation: 10174

From Jinja docs:

"Jinja2 is using Unicode internally which means that you have to pass Unicode objects to the render function or bytestrings that only consist of ASCII characters."

mylist = [u'some text \xc3']

Upvotes: 3

schlamar
schlamar

Reputation: 9509

You should never open an encoded file and not decode it.

You should either read the encoding from curl (e.g. with -i or -H option) and parse the HTTP headers or the output file if the encoding is not specified in the headers.

Or as an alternative to curl you can use the requests library which don't require writing to a file. Fetching a web resource will look like:

>>> r = requests.get('http://python.org')
>>> r.content
'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML...

Where content is already encoded following the HTTP specification.

As an last approach you could guess an encoding and replace unknown chars. This would be the most easy solution to implement. For example:

with codecs.open(filename, encoding='utf-8', errors='replace') as fobj:
    ...

Your approach will always loose information (if there are non ascii chars). My first two approaches never and the last one only if the guessed encoding is wrong.

Upvotes: 2

clark
clark

Reputation: 131

jla is right for my case.

I use utf-8 for the python source files, so use u prefix sovled my problem.

Upvotes: 0

chaimp
chaimp

Reputation: 17907

I figured it out. The key is to do str.encode('string-escape')

So, I did this:

template = Template('{{ list[0].encode("string-escape") }}')

And that worked.

Upvotes: 0

Related Questions