Reputation: 1466
What is the supossed way to read variables from Python (using reticulate
) from inside a R function?
Since the Python session cannot access variables that are in the environment of the function, is the only way copying them to the global environment?
library(reticulate)
library(glue)
library(tidyverse)
df1 <- data.frame(col1 = c(123, 234), col2 = c(233, 283))
py_run_string("print (r.df1)")
#> col1 col2
#> 0 123.0 233.0
#> 1 234.0 283.0
fun <- function(x) {
# create a new variable with a random name in the global environment
tmp_var_name <- str_c(sample(letters, 30, replace = TRUE), collapse = "")
message(tmp_var_name)
assign(tmp_var_name, x, envir = .GlobalEnv)
# Python can read from that global variable
py_run_string(glue("print (r.{tmp_var_name})"))
# finally, delete the variable
remove(list = tmp_var_name, envir = .GlobalEnv)
}
fun(df1)
#> vemcjnbxvnfvbdgushqkjcmtgzwhpu
#> col1 col2
#> 0 123.0 233.0
#> 1 234.0 283.0
Upvotes: 0
Views: 678
Reputation: 43344
The docs weren't helping, so I went to the source, which led me to discover the internal py_resolve_envir()
function which in the example in the question will return the R global environment, but won't always.
In particular, its first section is
# if an environment has been set, use it
envir <- getOption("reticulate.engine.environment")
if (is.environment(envir))
return(envir)
meaning you can pass an environment to an option called reticulate.engine.environment
, and reticulate will use that instead of the global environment as the place that gets searched when you try to subset into r
in python.
Thus, you can write:
set.seed(47L)
df1 <- data.frame(col1 = c(123, 234), col2 = c(233, 283))
fun <- function(x) {
e <- new.env()
options("reticulate.engine.environment" = e)
# create a new variable with a random name
tmp_var_name <- paste(sample(letters, 30, replace = TRUE), collapse = "")
message(tmp_var_name)
assign(tmp_var_name, x, envir = e)
res <- reticulate::py_run_string(glue::glue("print( r.{tmp_var_name} )"))
options("reticulate.engine.environment" = NULL) # unset option
invisible(res)
}
fun(df1)
#> zjtvorkmoydsepnxkabmeondrjaanu
#> col1 col2
#> 0 123.0 233.0
#> 1 234.0 283.0
to avoid needing to put everything in the global environment.
If you call dir()
on r
, it has a __getitem__
method defined which is what is getting called when you do r.{tmp_var_name}
:
reticulate::py_run_string('print( dir(r) )')
#> ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__']
reticulate::py_run_string('print( r.__getitem__ )')
#> <bound method make_python_function.<locals>.python_function of <__main__.R object at 0x111b0b6a0>>
x <- 47L
reticulate::py_run_string('print( r.__getitem__("x") )')
#> 47
The definition of this is here. Notably, getter()
calls its second parameter code
for a reason:
getter <- function(self, code) {
envir <- py_resolve_envir()
object <- eval(parse(text = as_r_value(code)), envir = envir)
r_to_py(object, convert = is.function(object))
}
—it gets passed to eval(parse(text = ...))
, which will turn any string into R code and run it. That means you can pass any R code into r.__getitem__()
, with a limitation that it must return something that can be converted to a python type (e.g. not an environment or a model), and a more superficial limitation that it cannot contain newlines. But that still lets you execute arbitrary code in R:
reticulate::py_run_string("print( r.__getitem__('head(iris)') )")
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 0 5.1 3.5 1.4 0.2 setosa
#> 1 4.9 3.0 1.4 0.2 setosa
#> 2 4.7 3.2 1.3 0.2 setosa
#> 3 4.6 3.1 1.5 0.2 setosa
#> 4 5.0 3.6 1.4 0.2 setosa
#> 5 5.4 3.9 1.7 0.4 setosa
reticulate::py_run_string("print( r.__getitem__('broom::tidy(lm(mpg ~ hp, mtcars))') )")
#> term estimate std.error statistic p.value
#> 0 (Intercept) 30.098861 1.633921 18.421246 6.642736e-18
#> 1 hp -0.068228 0.010119 -6.742389 1.787835e-07
# this method also gets called if you subset `r` with `[]`:
reticulate::py_run_string("print( r['library(tidyverse); mtcars %>% group_by(cyl) %>% summarise(across(everything(), mean))'] )")
#> cyl mpg disp ... am gear carb
#> 0 4.0 26.663636 105.136364 ... 0.727273 4.090909 1.545455
#> 1 6.0 19.742857 183.314286 ... 0.428571 3.857143 3.428571
#> 2 8.0 15.100000 353.100000 ... 0.142857 3.285714 3.500000
#>
#> [3 rows x 11 columns]
This code will get called from whatever environment py_resolve_envir()
returns, but if you can access (or make!) the thing you want from there, you can grab it.
Also this feels enough like a SQL injection attack suggesting that you really shouldn't let a user pick variable names if you're running this in Shiny or similar, but I don't expect that's likely anyway.
Upvotes: 2
Reputation: 887221
Just do an assignment inside the function and extract the list
with py$
fun <- function(x) {
# create a new variable with a random name in the global environment
tmp_var_name <- str_c(sample(letters, 30, replace = TRUE), collapse = "")
message(tmp_var_name)
assign(tmp_var_name, x, envir = .GlobalEnv)
# Python can read from that global variable
py_run_string(glue("x1 = r.{tmp_var_name}"))
# finally, delete the variable
remove(list = tmp_var_name, envir = .GlobalEnv)
py$x1
}
-testing
fun(df1)
#galdlxvgjpxuzkvmwznspxjdrftcmu
#$col2
#[1] 233 283
#$col1
#[1] 123 234
Upvotes: 1