meder omuraliev
meder omuraliev

Reputation: 186562

RestrictedPython: Call other functions within user-specified code?

Using Yuri Nudelman's code with the custom _import definition to specify modules to restrict serves as a good base but when calling functions within said user_code naturally due to having to whitelist everything is there any way to permit other user defined functions to be called? Open to other sandboxing solutions although Jupyter didn't seem straight-forward to embed within a web interface.

from RestrictedPython import safe_builtins, compile_restricted
from RestrictedPython.Eval import default_guarded_getitem

def _import(name, globals=None, locals=None, fromlist=(), level=0):
    safe_modules = ["math"]
    if name in safe_modules:
       globals[name] = __import__(name, globals, locals, fromlist, level)
    else:
        raise Exception("Don't you even think about it {0}".format(name))

safe_builtins['__import__'] = _import # Must be a part of builtins

def execute_user_code(user_code, user_func, *args, **kwargs):
    """ Executed user code in restricted env
        Args:
            user_code(str) - String containing the unsafe code
            user_func(str) - Function inside user_code to execute and return value
            *args, **kwargs - arguments passed to the user function
        Return:
            Return value of the user_func
    """

    def _apply(f, *a, **kw):
        return f(*a, **kw)

    try:
        # This is the variables we allow user code to see. @result will contain return value.
        restricted_locals = { 
            "result": None,
            "args": args,
            "kwargs": kwargs,
        }   

        # If you want the user to be able to use some of your functions inside his code,
        # you should add this function to this dictionary.
        # By default many standard actions are disabled. Here I add _apply_ to be able to access
        # args and kwargs and _getitem_ to be able to use arrays. Just think before you add
        # something else. I am not saying you shouldn't do it. You should understand what you
        # are doing thats all.
        restricted_globals = { 
            "__builtins__": safe_builtins,
            "_getitem_": default_guarded_getitem,
            "_apply_": _apply,
        }   

        # Add another line to user code that executes @user_func
        user_code += "\nresult = {0}(*args, **kwargs)".format(user_func)

        # Compile the user code
        byte_code = compile_restricted(user_code, filename="<user_code>", mode="exec")

        # Run it
        exec(byte_code, restricted_globals, restricted_locals)
       # User code has modified result inside restricted_locals. Return it.
        return restricted_locals["result"]

    except SyntaxError as e:
        # Do whaever you want if the user has code that does not compile
        raise
    except Exception as e:
        # The code did something that is not allowed. Add some nasty punishment to the user here.
        raise

i_example = """
import math

def foo():
    return 7

def myceil(x):
    return math.ceil(x)+foo()
"""
print(execute_user_code(i_example, "myceil", 1.5))

Running this returns 'foo' is not defined

Upvotes: 6

Views: 1874

Answers (2)

dumbass
dumbass

Reputation: 27214

First of all, the replacement for the __import__ built-in is implemented incorrectly. That built-in is supposed to return the imported module, not mutate the globals to include it:

Python 3.9.12 (main, Mar 24 2022, 13:02:21)
[GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> __import__('math')
<module 'math' (built-in)>
>>> math
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'math' is not defined

A better way to reimplement __import__ would be this:

_SAFE_MODULES = frozenset(("math",))

def _safe_import(name, *args, **kwargs):
    if name not in _SAFE_MODULES:
        raise Exception(f"Don't you even think about {name!r}")
    return __import__(name, *args, **kwargs)

The fact that you mutated globals in your original implementation was partially masking the primary bug. Namely: name assignments within restricted code (function definitions, variable assignments and imports) mutate the locals dict, but name look-ups are by default done as global look-ups, bypassing the locals entirely. You can see this by disassembling the restricted bytecode using __import__('dis').dis(byte_code):

  2           0 LOAD_CONST               0 (0)
              2 LOAD_CONST               1 (None)
              4 IMPORT_NAME              0 (math)
              6 STORE_NAME               0 (math)

  4           8 LOAD_CONST               2 (<code object foo at 0x7fbef4eef3a0, file "<user_code>", line 4>)
             10 LOAD_CONST               3 ('foo')
             12 MAKE_FUNCTION            0
             14 STORE_NAME               1 (foo)

  7          16 LOAD_CONST               4 (<code object myceil at 0x7fbef4eef660, file "<user_code>", line 7>)
             18 LOAD_CONST               5 ('myceil')
             20 MAKE_FUNCTION            0
             22 STORE_NAME               2 (myceil)
             24 LOAD_CONST               1 (None)
             26 RETURN_VALUE

Disassembly of <code object foo at 0x7fbef4eef3a0, file "<user_code>", line 4>:
  5           0 LOAD_CONST               1 (7)
              2 RETURN_VALUE

Disassembly of <code object myceil at 0x7fbef4eef660, file "<user_code>", line 7>:
  8           0 LOAD_GLOBAL              0 (_getattr_)
              2 LOAD_GLOBAL              1 (math)
              4 LOAD_CONST               1 ('ceil')
              6 CALL_FUNCTION            2
              8 LOAD_FAST                0 (x)
             10 CALL_FUNCTION            1
             12 LOAD_GLOBAL              2 (foo)
             14 CALL_FUNCTION            0
             16 BINARY_ADD
             18 RETURN_VALUE

The documentation for exec explains (emphasis mine):

If only globals is provided, it must be a dictionary (and not a subclass of dictionary), which will be used for both the global and the local variables. If globals and locals are given, they are used for the global and local variables, respectively. If provided, locals can be any mapping object. Remember that at the module level, globals and locals are the same dictionary. If exec gets two separate objects as globals and locals, the code will be executed as if it were embedded in a class definition.

This makes separate mappings for locals and globals completely spurious. We can therefore simply get rid of the locals dict, and put everything in globals. The entire code should look something like this:

from RestrictedPython import safe_builtins, compile_restricted


_SAFE_MODULES = frozenset(("math",))


def _safe_import(name, *args, **kwargs):
    if name not in _SAFE_MODULES:
        raise Exception(f"Don't you even think about {name!r}")
    return __import__(name, *args, **kwargs)


def execute_user_code(user_code, user_func, *args, **kwargs):
    my_globals = {
        "__builtins__": {
            **safe_builtins,
            "__import__": _safe_import,
        },
    }

    try:
        byte_code = compile_restricted(
            user_code, filename="<user_code>", mode="exec")
    except SyntaxError:
        # syntax error in the sandboxed code
        raise

    try:
        exec(byte_code, my_globals)
        return my_globals[user_func](*args, **kwargs)
    except BaseException:
        # runtime error (probably) in the sandboxed code
        raise

Above I also managed to fix a couple of tangential issues:

  • Instead of injecting the function call into the compiled snippet, I look up the function in the globals dict directly. This avoids a potential code injection vector if user_func happens to come from an untrusted source, and avoids having to inject args, kwargs and result into the sandbox, which would enable sandboxed code to clobber it.
  • I avoid mutating the safe_builtins object provided by the RestrictedPython module. Otherwise, if any other code within your program happens to be using RestrictedPython, it might have been affected.
  • I split the exception handling between the two steps: compilation and execution. This minimises the probability that bugs in the sandbox code will be misattributed to the sandboxed code.
  • I changed the caught runtime exception type to BaseException, to also catch cases when sandboxed code attempts to raise KeyboardInterrupt or SystemExit (which do not derive from Exception, but only BaseException).
  • I also removed references to _getitem_ and _apply_, which don’t seem to be used for anything. If they turn out to be necessary after all, you may restore them.

(Note, however, that this still does not protect against DoS via infinite loops within the sandbox.)

Upvotes: 7

aaron
aaron

Reputation: 43083

In exec(byte_code, restricted_globals, restricted_locals):

  1. def foo(): modifies restricted_locals.
  2. myceil can only see its globals, i.e. myceil.__globals__, which is restricted_globals.

You can update f.__globals__/restricted_globals in _apply:

def _apply(f, *a, **kw):
    for k, v in restricted_locals.items():
        if k not in _restricted_locals_keys and k not in f.__globals__:
            f.__globals__[k] = v
    return f(*a, **kw)

where _restricted_locals_keys is:

restricted_locals = {
    "result": None,
    "args": args,
    "kwargs": kwargs,
}
_restricted_locals_keys = set(restricted_locals.keys())

If you don't want to modify restricted_globals, then copy f with new globals in _apply:

import copy
import types

def _copy_func(f, func_globals=None):
    func_globals = func_globals or f.__globals__
    return types.FunctionType(f.__code__, func_globals, name=f.__name__, argdefs=f.__defaults__, closure=f.__closure__)

def _apply(f, *a, **kw):
    func_globals = copy.copy(f.__globals__)
    for k, v in restricted_locals.items():
        if k not in _restricted_locals_keys and k not in func_globals:
            func_globals[k] = v
    f = _copy_func(f, func_globals=func_globals)
    return f(*a, **kw)

Upvotes: 2

Related Questions