user5497
user5497

Reputation: 243

Slow glm calculation when using rpy2

I want to calculate logistic regression parameters using R's glm package. I'm working with python and using rpy2 for that. For some reason, when I'm running the glm function using R I get much faster results than by using rpy2. Do you know why the calculations using rpy2 is much slower? I'm using R - V2.13.1 and rpy2 - V2.0.8 Here is the code I'm using:

import numpy
from rpy2 import robjects as ro
import rpy2.rlike.container as rlc

def train(self, x_values, y_values, weights):
        x_float_vector = [ro.FloatVector(x) for x in numpy.array(x_values).transpose()]
        y_float_vector = ro.FloatVector(y_values)   
        weights_float_vector = ro.FloatVector(weights)
        names = ['v' + str(i) for i in xrange(len(x_float_vector))]
        d = rlc.TaggedList(x_float_vector + [y_float_vector], names + ['y'])
        data = ro.RDataFrame(d)
        formula = 'y ~ '
        for x in names:
            formula += x + '+'
        formula = formula[:-1]
        fit_res = ro.r.glm(formula=ro.r(formula), data=data, weights=weights_float_vector,  family=ro.r('binomial(link="logit")'))

Upvotes: 0

Views: 1189

Answers (1)

lgautier
lgautier

Reputation: 11555

Without the full R code you are benchmarking against, it is difficult to precisely point out where the problem might be.

You might want to run this through a Python profiler to see where the bottleneck(s) is (are).

Finally, the current release for rpy2 is 2.2.6. Beside API changes, it is running faster and has (presumably) less bugs than 2.0.8.

Edit: From your comments I am now suspecting that you are calling your function in a loop, and a large fraction of the time is spent building R vectors (that might only have to be built once).

Upvotes: 1

Related Questions