Joshua Mannheimer
Joshua Mannheimer

Reputation: 153

Accessing a R user defined function in Python

So I need to do Principle Component Regression with cross validation and I could not find a package in Python that would do so. I wrote my own PCR class but when tested against R's pls package it performs significantly worse and is much slower on high dimensional data (~50000 features) which I am still not sure why but that is another question. Because all of my other code is in python, and in the interest of saving time I decided the best way might just be able to write an R function that utilizes the PLS package in R. Here is the function:

R_pls <-function(X_train,y_train,X_test){
  library(pls)
  X<-as.matrix(X_train)
  y<-as.matrix(y_train)
  tdata<-data.frame(y,X=I(X))
  REGmodel <- pcr(y~X,scale=FALSE,data=tdata,validation="CV")
  B<-RMSEP(REGmodel)
  C<-B[[1]]
  q<-length(C)
  degs<-c(1:q)
  allvals<-C[degs%%2==0]
  allvals<-allvals[-1]
  comps<-which.min(allvals)
  xt<-as.matrix(X_test)
  ndata<-data.frame(X=I(xt))
  ypred_test<-as.data.frame(predict(REGmodel,ncomp=comps,newdata=ndata,se.fit=TRUE))
  ntdata<-data.frame(X=I(X))
  ypred_train<-as.data.frame(predict(REGmodel,ncomp=comps,newdata=ntdata,se.fit=TRUE))
  data_out=list(ypred_test=ypred_test,ypred_train=ypred_train)
  return(data_)
}

So I have found a good amount of information on how to access R built in functions but cannot really find anything for this situation. So I tied the following:

import rpy2.robjects as ro
prs=ro('R_pls')

where R_pls is the R function above. This produces

TypeError: 'module' object is not callable.

Any idea how I might get this to work or I am open to suggestions if there might be a better method.

Thanks

Upvotes: 4

Views: 1757

Answers (1)

Parfait
Parfait

Reputation: 107587

Consider importing the abitrary R user-defined function as a package with rpy2's SignatureTranslatedAnonymousPackage (STAP):

from rpy2.robjects.numpy2ri import numpy2ri, pandas2ri
from rpy2.robjects.packages import STAP
# for rpy2 < 2.6.1
# from rpy2.robjects.packages import SignatureTranslatedAnonymousPackage as STAP    

r_fct_string = """    
R_pls <- function(X_train, y_train, X_test){
  library(pls)

  X <- as.matrix(X_train)
  y <- as.matrix(y_train)
  xt <- as.matrix(X_test)

  tdata <- data.frame(y,X=I(X))
  REGmodel <- pls::pcr(y~X,scale=FALSE,data=tdata,validation="CV")
  B <- RMSEP(REGmodel)
  C <- B[[1]]
  q <- length(C)
  degs <- c(1:q)
  allvals <- C[degs%%2==0]
  allvals <- allvals[-1]
  comps <- which.min(allvals)
  ndata <- data.frame(X=I(xt))

  ypred_test <- as.data.frame(predict(REGmodel,ncomp=comps,newdata=ndata,se.fit=TRUE))
  ntdata <- data.frame(X=I(X))
  ypred_train <- as.data.frame(predict(REGmodel,ncomp=comps,newdata=ntdata,se.fit=TRUE))
  data_out <- list(ypred_test=ypred_test, ypred_train=ypred_train)

  return(data_out)
}
"""

r_pkg = STAP(r_fct_string, "r_pkg")

# CONVERT PYTHON NUMPY MATRICES TO R OBJECTS
r_X_train, r_y_train, r_X_test = map(numpy2ri, py_X_train, py_y_train, py_X_test)

# PASS R OBJECTS INTO FUNCTION (WILL NEED TO EXTRACT DFs FROM RESULT)
p_res = r_pkg.R_pls(r_X_train, r_y_train, r_X_test)

Alternatively, you can source the function as @agstudy shows here if function is saved in a separate .R script then call it like any Python function.

import rpy2.robjects as ro
robjects.r('''source('my_R_pls_func.r')''')

r_pls = ro.globalenv['R_pls']

# CONVERT PYTHON NUMPY MATRICES TO R OBJECTS
r_X_train, r_y_train, r_X_test = map(numpy2ri, py_X_train, py_y_train, py_X_test)

# PASS R OBJECTS INTO FUNCTION (WILL NEED TO EXTRACT DFs FROM RESULT)
p_res = r_pls(r_X_train, r_y_train, r_X_test)

Upvotes: 5

Related Questions