Georg Heiler
Georg Heiler

Reputation: 17676

rpy2 in jupyter is echoing the whole function

Whilst trying to connect python with R I stumbled upon an minimal example:

from rpy2.robjects import FloatVector
from rpy2.robjects.packages import importr
stats = importr('stats')
base = importr('base')

ctl = FloatVector([4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14])
trt = FloatVector([4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69])
group = base.gl(2, 10, 20, labels = ["Ctl","Trt"])
weight = ctl + trt

robjects.globalenv["weight"] = weight
robjects.globalenv["group"] = group
lm_D9 = stats.lm("weight ~ group")
print(stats.anova(lm_D9))

# omitting the intercept
lm_D90 = stats.lm("weight ~ group - 1")
print(base.summary(lm_D90))

Which works fine (no errors). But my output looks like:

Analysis of Variance Table

Response: weight
          Df Sum Sq Mean Sq F value Pr(>F)
group      1 0.6882 0.68820  1.4191  0.249
Residuals 18 8.7293 0.48496               


Call:
(function (formula, data, subset, weights, na.action, method = "qr", 
    model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, 
    contrasts = NULL, offset, ...) 
{
    ret.x <- x
    ret.y <- y
    cl <- match.call()
    mf <- match.call(expand.dots = FALSE)
    m <- match(c("formula", "data", "subset", "weights", "na.action", 
        "offset"), names(mf), 0L)
    mf <- mf[c(1L, m)]
    mf$drop.unused.levels <- TRUE
    mf[[1L]] <- quote(stats::model.frame)
    mf <- eval(mf, parent.frame())
    if (method == "model.frame") 
        return(mf)
    else if (method != "qr") 
        warning(gettextf("method = '%s' is not supported. Using 'qr'", 
            method), domain = NA)
    mt <- attr(mf, "terms")
    y <- model.response(mf, "numeric")
    .....

meaning the whole function is echoed back to me. Can I set a different log level somewhere?

Upvotes: 0

Views: 32

Answers (1)

lgautier
lgautier

Reputation: 11545

This is happening because the calling expression (stats.lm("weight ~ group - 1")) is first going through evaluation in Python before being passed to R and the dispatch function for summary is reporting the calling R code.

In other words, stats.lm is first evaluated in Python and this returns the code for the function lm in R and that code is called with your argument "weight ~ group -1"). Think of it as if R saw that you were using an anonymous function with a call of the form function(myformula) { <do things> ) }("weight ~ group - 1")`.

Ways to avoid this could be to evaluate an R expression where R is resolving the value associated with the symbol name for the function lm during the call. The simplest would be:

robjects.globalenv['myformula'] = "weight ~ group - 1"
lm_D90 = robjects.reval("lm(myformula)")

Note that symbols needed for your call be bundled in a namespace / environment (might be tidier than putting everything in globalenv)"

myenv = rpy2.robjects.Environment()
myenv['myformula'] = "weight ~ group - 1"
lm_D90 = robjects.reval("lm(myformula)", myenv)

Otherwise one might also find it a more elegant solution to first programmatically build an unevaluated R expression with lm() and then evaluate it.

Upvotes: 1

Related Questions