aiedu
aiedu

Reputation: 142

Error calling a R function from python using rpy2 with survival library

When calling a function in the survival package in R from within python with the rpy2 interface I get the following error:

RRuntimeError: Error in formula[[2]] : subscript out of bounds

Any pointer to solve the issue please?

Thanks

Code:

import pandas as pd
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector
from rpy2.robjects.packages import importr
import rpy2.robjects as ro
R = ro.r
from rpy2.robjects import pandas2ri

pandas2ri.activate()


## install the survival package
utils = rpackages.importr('utils')
utils.chooseCRANmirror(ind=1) # select the first mirror in the list
utils.install_packages(StrVector('survival'))


#Load the library and example data set
survival=importr('survival')
infert = R('infert')

## Linear model works fine
reslm=R.lm('case~spontaneous+induced',data=infert)

#Run the example clogit function, which fails
rescl=R.clogit('case~spontaneous+induced+strata(stratum)',data=infert)

Upvotes: 0

Views: 1181

Answers (2)

jackinovik
jackinovik

Reputation: 869

This fails when including the strata() function within the formula because it's not evaluated in the right environment. In R, formulas are special language constructs and so they need to be treated separately by rpy2.

So, for your example, this would look like:

rescl = R.clogit(ro.Formula('case ~ spontaneous + induced + strata(stratum)'),
                 data = infert)

See the documentation for rpy2.robjects.Formula for more details. That documentation also discusses the pros & cons of this approach vs that provided by @Gwang-jin-kim

Upvotes: 0

Gwang-Jin Kim
Gwang-Jin Kim

Reputation: 9865

After trying around, I found out, there is a difference, whether you offer the R instance of rpy2 the full R-code string to execute, or not.

Thus, you can make your function run, by giving as much as possible as R code:

#Run the example clogit function, which fails
rescl=R.clogit('case~spontaneous+induced+strata(stratum)',data=infert)

#But give the R code to be executed as one complete string - this works:
rescl=R('clogit(case ~ spontaneous + induced + strata(stratum), data = infert)')

If you capture the return value to a variable within R, you can inspect the data and get out the critical information of the model by the usual functions in R.

E.g.

R('rescl.in.R <- clogit(case ~ spontaneous + induced + strata(stratum), data = infert)')

R('str(rescl.in.R)')

# or:
R('coef(rescl.in.R)')
## array([1.98587552, 1.40901163])

R('names(rescl.in.R)') 
## array(['coefficients', 'var', 'loglik', 'score', 'iter',
##        'linear.predictors', 'residuals', 'means', 'method', 'n', 'nevent',
##        'terms', 'assign', 'wald.test', 'y', 'formula', 'xlevels', 'call',
##        'userCall'], dtype='<U17')

It helps a lot - at least in this first phase of using rpy2 (for me, too), to have your r instance open and trying the code in parallel which you do, since the output in R is far more readable and you know and see what you are doing and what you could address. In Python, the output is stripped off of important informations (like the name etc) - and in addition, it is not pretty-printed.

Upvotes: 4

Related Questions