crestor
crestor

Reputation: 1466

interoperability between Python and R

What is the supossed way to read variables from Python (using reticulate) from inside a R function? Since the Python session cannot access variables that are in the environment of the function, is the only way copying them to the global environment?

library(reticulate)
library(glue)
library(tidyverse)

df1 <- data.frame(col1 = c(123, 234), col2 = c(233, 283))

py_run_string("print (r.df1)")
#>     col1   col2
#> 0  123.0  233.0
#> 1  234.0  283.0

fun <- function(x) {
  # create a new variable with a random name in the global environment
  tmp_var_name <- str_c(sample(letters, 30, replace = TRUE), collapse = "")
  message(tmp_var_name)
  assign(tmp_var_name, x, envir = .GlobalEnv)
  
  # Python can read from that global variable
  py_run_string(glue("print (r.{tmp_var_name})"))
  
  # finally, delete the variable
  remove(list = tmp_var_name, envir = .GlobalEnv)
}

fun(df1)
#> vemcjnbxvnfvbdgushqkjcmtgzwhpu
#>     col1   col2
#> 0  123.0  233.0
#> 1  234.0  283.0

Upvotes: 0

Views: 678

Answers (2)

alistaire
alistaire

Reputation: 43344

The docs weren't helping, so I went to the source, which led me to discover the internal py_resolve_envir() function which in the example in the question will return the R global environment, but won't always.

In particular, its first section is

  # if an environment has been set, use it
  envir <- getOption("reticulate.engine.environment")
  if (is.environment(envir))
    return(envir)

meaning you can pass an environment to an option called reticulate.engine.environment, and reticulate will use that instead of the global environment as the place that gets searched when you try to subset into r in python.

Thus, you can write:

set.seed(47L)
df1 <- data.frame(col1 = c(123, 234), col2 = c(233, 283))

fun <- function(x) {
  e <- new.env()
  options("reticulate.engine.environment" = e)
  
  # create a new variable with a random name
  tmp_var_name <- paste(sample(letters, 30, replace = TRUE), collapse = "")
  message(tmp_var_name)
  assign(tmp_var_name, x, envir = e)
  
  res <- reticulate::py_run_string(glue::glue("print( r.{tmp_var_name} )"))
  options("reticulate.engine.environment" = NULL)  # unset option
  invisible(res)
}

fun(df1)
#> zjtvorkmoydsepnxkabmeondrjaanu
#>     col1   col2
#> 0  123.0  233.0
#> 1  234.0  283.0

to avoid needing to put everything in the global environment.


Coda: Arbitrary R code execution from Python

If you call dir() on r, it has a __getitem__ method defined which is what is getting called when you do r.{tmp_var_name}:

reticulate::py_run_string('print( dir(r) )')
#> ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__']

reticulate::py_run_string('print( r.__getitem__ )')
#> <bound method make_python_function.<locals>.python_function of <__main__.R object at 0x111b0b6a0>>

x <- 47L
reticulate::py_run_string('print( r.__getitem__("x") )')
#> 47

The definition of this is here. Notably, getter() calls its second parameter code for a reason:

  getter <- function(self, code) {
    envir <- py_resolve_envir()
    object <- eval(parse(text = as_r_value(code)), envir = envir)
    r_to_py(object, convert = is.function(object))
  }

—it gets passed to eval(parse(text = ...)), which will turn any string into R code and run it. That means you can pass any R code into r.__getitem__(), with a limitation that it must return something that can be converted to a python type (e.g. not an environment or a model), and a more superficial limitation that it cannot contain newlines. But that still lets you execute arbitrary code in R:

reticulate::py_run_string("print( r.__getitem__('head(iris)') )")
#>    Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species
#> 0           5.1          3.5           1.4          0.2  setosa
#> 1           4.9          3.0           1.4          0.2  setosa
#> 2           4.7          3.2           1.3          0.2  setosa
#> 3           4.6          3.1           1.5          0.2  setosa
#> 4           5.0          3.6           1.4          0.2  setosa
#> 5           5.4          3.9           1.7          0.4  setosa

reticulate::py_run_string("print( r.__getitem__('broom::tidy(lm(mpg ~ hp, mtcars))') )")
#>           term   estimate  std.error  statistic       p.value
#> 0  (Intercept)  30.098861   1.633921  18.421246  6.642736e-18
#> 1           hp  -0.068228   0.010119  -6.742389  1.787835e-07

# this method also gets called if you subset `r` with `[]`:
reticulate::py_run_string("print( r['library(tidyverse); mtcars %>% group_by(cyl) %>% summarise(across(everything(), mean))'] )")
#>    cyl        mpg        disp    ...           am      gear      carb
#> 0  4.0  26.663636  105.136364    ...     0.727273  4.090909  1.545455
#> 1  6.0  19.742857  183.314286    ...     0.428571  3.857143  3.428571
#> 2  8.0  15.100000  353.100000    ...     0.142857  3.285714  3.500000
#> 
#> [3 rows x 11 columns]

This code will get called from whatever environment py_resolve_envir() returns, but if you can access (or make!) the thing you want from there, you can grab it.

Also this feels enough like a SQL injection attack suggesting that you really shouldn't let a user pick variable names if you're running this in Shiny or similar, but I don't expect that's likely anyway.

Upvotes: 2

akrun
akrun

Reputation: 887221

Just do an assignment inside the function and extract the list with py$

fun <- function(x) {
  # create a new variable with a random name in the global environment
  tmp_var_name <- str_c(sample(letters, 30, replace = TRUE), collapse = "")
  message(tmp_var_name)
  assign(tmp_var_name, x, envir = .GlobalEnv)
  
  # Python can read from that global variable
  py_run_string(glue("x1 = r.{tmp_var_name}"))
  
  # finally, delete the variable
  remove(list = tmp_var_name, envir = .GlobalEnv)
  py$x1
}

-testing

fun(df1)
#galdlxvgjpxuzkvmwznspxjdrftcmu
#$col2
#[1] 233 283

#$col1
#[1] 123 234

Upvotes: 1

Related Questions