Reputation: 243
I want to calculate logistic regression parameters using R's glm package. I'm working with python and using rpy2 for that. For some reason, when I'm running the glm function using R I get much faster results than by using rpy2. Do you know why the calculations using rpy2 is much slower? I'm using R - V2.13.1 and rpy2 - V2.0.8 Here is the code I'm using:
import numpy
from rpy2 import robjects as ro
import rpy2.rlike.container as rlc
def train(self, x_values, y_values, weights):
x_float_vector = [ro.FloatVector(x) for x in numpy.array(x_values).transpose()]
y_float_vector = ro.FloatVector(y_values)
weights_float_vector = ro.FloatVector(weights)
names = ['v' + str(i) for i in xrange(len(x_float_vector))]
d = rlc.TaggedList(x_float_vector + [y_float_vector], names + ['y'])
data = ro.RDataFrame(d)
formula = 'y ~ '
for x in names:
formula += x + '+'
formula = formula[:-1]
fit_res = ro.r.glm(formula=ro.r(formula), data=data, weights=weights_float_vector, family=ro.r('binomial(link="logit")'))
Upvotes: 0
Views: 1189
Reputation: 11555
Without the full R code you are benchmarking against, it is difficult to precisely point out where the problem might be.
You might want to run this through a Python profiler to see where the bottleneck(s) is (are).
Finally, the current release for rpy2 is 2.2.6. Beside API changes, it is running faster and has (presumably) less bugs than 2.0.8.
Edit: From your comments I am now suspecting that you are calling your function in a loop, and a large fraction of the time is spent building R vectors (that might only have to be built once).
Upvotes: 1