Reputation: 23948

Regression Analysis with statsmodels in Python

I'm new to Python and learning how to do regression analysis with statsmodels in Python (moving from R to Python and thinking in R ways). My minimum working example is below:

Income  =  [80, 100, 120, 140, 160, 180, 200, 220, 240, 260]
Expend  =  [70,  65,  90,  95, 110, 115, 120, 140, 155, 150]

import pandas as pd
df1 = pd.DataFrame(
{'Income': Income,
     'Expend': Expend
    })

#regression with formula
import statsmodels.formula.api as smf

#instantiation
reg1 = smf.ols('Expend ~ Income', data = df1)

#members of reg object
print(dir(reg1))

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_data_attr', '_df_model', '_df_resid', '_fit_ridge', '_get_init_kwds', '_handle_data', '_init_keys', '_setup_score_hess', 'data', 'df_model', 'df_resid', 'endog', 'endog_names', 'exog', 'exog_names', 'fit', 'fit_regularized', 'formula', 'from_formula', 'get_distribution', 'hessian', 'information', 'initialize', 'k_constant', 'loglike', 'nobs', 'predict', 'rank', 'score', 'weights', 'wendog', 'wexog', 'whiten']

#members of the object provided by the modelling.
print(dir(reg1.fit()))

['HC0_se', 'HC1_se', 'HC2_se', 'HC3_se', '_HCCM', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_cache', '_data_attr', '_get_robustcov_results', '_is_nested', '_wexog_singular_values', 'aic', 'bic', 'bse', 'centered_tss', 'compare_f_test', 'compare_lm_test', 'compare_lr_test', 'condition_number', 'conf_int', 'conf_int_el', 'cov_HC0', 'cov_HC1', 'cov_HC2', 'cov_HC3', 'cov_kwds', 'cov_params', 'cov_type', 'df_model', 'df_resid', 'eigenvals', 'el_test', 'ess', 'f_pvalue', 'f_test', 'fittedvalues', 'fvalue', 'get_influence', 'get_prediction', 'get_robustcov_results', 'initialize', 'k_constant', 'llf', 'load', 'model', 'mse_model', 'mse_resid', 'mse_total', 'nobs', 'normalized_cov_params', 'outlier_test', 'params', 'predict', 'pvalues', 'remove_data', 'resid', 'resid_pearson', 'rsquared', 'rsquared_adj', 'save', 'scale', 'ssr', 'summary', 'summary2', 't_test', 'tvalues', 'uncentered_tss', 'use_t', 'wald_test', 'wald_test_terms', 'wresid']

I want to understand the output of print(dir(reg1)) and print(dir(reg1.fit())). Where I can get the document of these components and examples of these pieces?

Upvotes: 2

Answers (4)

Devidas

Reputation: 2517

Some points to know about Python.

Python have built-in offline documentation in python try command in python interpreter help
```
>>> help(dir)
>>> help(help)
```
If you want to see online, you can visit pydocs for generic help. And for package specific help, visit pypi (Python package index)
Now specific to your problem. help for statsmodels. which redirects to Homepage
Finally, here is a page which may interest you: Fitting models using R-style formulas.

Upvotes: 1

akoeltringer

Reputation: 1721

Man, that is simple "googling" / reading the doc page. What is maybe confusing is the use of statsmodels.formula.api. This is to provide the possibility of entering R-style formulas.

The Docs of statsmodels are located here: StatsModels Index Page. Scroll down until you reach "Table of Contents". There click on Linear Regression. Scrolling down to Module Reference there are links to Model Classes and Result Classes.

The correct model class is already pointed out by @Bill Bell: it is OLS. Below methods, you can find the link to the documentation of fit, where it states that fit returns a RegressionResults object.

The RegressionResults doc page explains the attributes you are interested in.

Note that:

attributes starting/ending in double underscore __, e.g. __class__ etc. are Python special attributes.
you can get help inside the Python interpreter by appending ?, e.g. by typing reg1? (much like in R where you pre-pend the ?)

Upvotes: 1

Dilber

Reputation: 11

dir() is used to list all the attributes, methods and variables in a module just like in R as library(lme4) methods(class = "merMod") You can also try reg1.dict

Upvotes: 1

Bill Bell

Reputation: 21663

>>> reg1.__module__
'statsmodels.regression.linear_model'

Googling for this gave me the page, http://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html, which includes a link to fit.

I don't know that this has everything you need. I hope it's a leg up.

Upvotes: 0

Regression Analysis with statsmodels in Python

Answers (4)

Related Questions