Reputation: 78903
Inspired by the linear models example from the docs, I'd like to print a nice summary after running an lm
command.
When I run (see the final line in the example)
print(base.summary(stats.lm('foo ~ bar'))
I get a whole function listing which starts as follows:
Call:
(function (formula, data, subset, weights, na.action, method = "qr",
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
contrasts = NULL, offset, ...)
{
ret.x <- x
ret.y <- y
cl <- match.call()
mf <- match.call(expand.dots = FALSE)
With the desired R
output at the bottom:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
foo 5.0320 0.2202 22.85 9.55e-15 ***
bar 4.6610 0.2202 21.16 3.62e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6964 on 18 degrees of freedom
Multiple R-squared: 0.9818, Adjusted R-squared: 0.9798
F-statistic: 485.1 on 2 and 18 DF, p-value: < 2.2e-16
This is moderately problematic, but becomes unworkable when the data being fed to lm
is a pandas.DataFrame
, because base.summary
seems to want to print all the data.
Is there a way to just get the nice formatted R
output in a pd.DataFrame
without all the extra gubbins?
Upvotes: 3
Views: 1136
Reputation: 78903
For posterity, here's a really nice way to get the numbers from an lm
back into a pd.DataFrame
(thanks to @Metrics for the tip-off about broom)
def _run_regression(data, y_name):
"""
Run a linear regression, in R, using `data` with dependent variable
`y_name` and independent variables all other columns of `data`.
"""
from rpy2.robjects.packages import importr
stats = importr('stats')
broom = importr('broom')
lm = broom.tidy(stats.lm('%s ~ . ' % y_name, data=data))
return _extract_R_df(lm).set_index('term')
def _extract_R_df(df):
"""
Extract the R DataFrame `df` as a pd.DataFrame. This slightly
longer method is necessary because `np.asarray(df)` drops the
exponent on very small numbers!
"""
return pd.DataFrame({name:np.asarray(df.rx(name))[0] for name in df.names})
Which results in a DataFrame similar to this:
estimate p.value statistic std.error
term
(Intercept) -3.709995e-16 0.000056 -4.712554e+00 7.872579e-17
x_is 8.000000e-01 0.000000 1.067919e+16 7.491204e-17
v_is 2.000000e-01 0.000000 2.107838e+15 9.488394e-17
d_ij -2.000000e-01 0.000000 -2.970482e+14 6.732913e-16
d1 1.000000e-01 0.000000 4.045155e+14 2.472093e-16
d2 3.000000e-01 0.000000 5.320521e+14 5.638545e-16
d3 7.000000e-01 0.000000 1.779338e+15 3.934048e-16
Upvotes: 2