jez
jez

Reputation: 15359

How can a representation of a literal be safely evaluated, assuming unicode_literals?

In Python 2, I would like to evaluate a string that contains the representation of a literal. I would like to do this safely, so I don't want to use eval()—instead I've become accustomed to using ast.literal_eval() for this kind of task.

However, I also want to evaluate under the assumption that string literals in plain quotes denote unicode objects—i.e. the kind of forward-compatible behavior you get with from __future__ import unicode_literals. In the example below, eval() seems to respect this preference, but ast.literal_eval() seems not to.

from __future__ import unicode_literals, print_function

import ast

raw = r"""   'hello'    """

value = eval(raw.strip())
print(repr(value))
# Prints:
# u'hello'

value = ast.literal_eval(raw.strip())
print(repr(value))
# Prints:
# 'hello'

Note that I'm looking for a general-purpose literal_eval replacement—I don't know in advance that the output is necessarily a string object. I want to be able to assume that raw is the representation of an arbitrary Python literal, which may be a string, or may contain one or more strings, or not.

Is there a way of getting the best of both worlds: a function that both securely evaluates representations of arbitrary Python literals and respects the unicode_literals preference?

Upvotes: 3

Views: 129

Answers (3)

blhsing
blhsing

Reputation: 106946

What makes codes potentially insecure are references to names and/or attributes. You can subclass ast.NodeVisitor to ensure that there are no such references in a given piece of code before you eval it:

import ast
from textwrap import dedent

class Validate(ast.NodeVisitor):
    def visit_Name(self, node):
        raise ValueError("Reference to name '%s' found in expression" % node.id)
    def visit_Attribute(self, node):
        raise ValueError("Reference to attribute '%s' found in expression" % node.attr)

Validate().visit(ast.parse(dedent(raw), '<inline>', 'eval'))
eval(raw)

Upvotes: 1

user2357112
user2357112

Reputation: 281594

Neither ast.literal_eval nor ast.parse offer the option to set compiler flags. You can pass the appropriate flags to compile to parse the string with unicode_literals activated, then run ast.literal_eval on the resulting node:

import ast

# Not a future statement. This imports the __future__ module, and has no special
# effects beyond that.
import __future__

unparsed = '"blah"'
parsed = compile(unparsed,
                 '<string>',
                 'eval',
                 ast.PyCF_ONLY_AST | __future__.unicode_literals.compiler_flag)
value = ast.literal_eval(parsed)

Upvotes: 4

wim
wim

Reputation: 363183

An interesting question. I'm not sure if there is any solution for ast.literal_eval, here, but I offer a cheap/safe workaround:

def my_literal_eval(s):
    ast.literal_eval(s)
    return eval(s)

Upvotes: 4

Related Questions