Reputation: 23948
I'm new to Python and learning how to do regression analysis with statsmodels
in Python (moving from R to Python and thinking in R ways). My minimum working example is below:
Income = [80, 100, 120, 140, 160, 180, 200, 220, 240, 260]
Expend = [70, 65, 90, 95, 110, 115, 120, 140, 155, 150]
import pandas as pd
df1 = pd.DataFrame(
{'Income': Income,
'Expend': Expend
})
#regression with formula
import statsmodels.formula.api as smf
#instantiation
reg1 = smf.ols('Expend ~ Income', data = df1)
#members of reg object
print(dir(reg1))
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_data_attr', '_df_model', '_df_resid', '_fit_ridge', '_get_init_kwds', '_handle_data', '_init_keys', '_setup_score_hess', 'data', 'df_model', 'df_resid', 'endog', 'endog_names', 'exog', 'exog_names', 'fit', 'fit_regularized', 'formula', 'from_formula', 'get_distribution', 'hessian', 'information', 'initialize', 'k_constant', 'loglike', 'nobs', 'predict', 'rank', 'score', 'weights', 'wendog', 'wexog', 'whiten']
#members of the object provided by the modelling.
print(dir(reg1.fit()))
['HC0_se', 'HC1_se', 'HC2_se', 'HC3_se', '_HCCM', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_cache', '_data_attr', '_get_robustcov_results', '_is_nested', '_wexog_singular_values', 'aic', 'bic', 'bse', 'centered_tss', 'compare_f_test', 'compare_lm_test', 'compare_lr_test', 'condition_number', 'conf_int', 'conf_int_el', 'cov_HC0', 'cov_HC1', 'cov_HC2', 'cov_HC3', 'cov_kwds', 'cov_params', 'cov_type', 'df_model', 'df_resid', 'eigenvals', 'el_test', 'ess', 'f_pvalue', 'f_test', 'fittedvalues', 'fvalue', 'get_influence', 'get_prediction', 'get_robustcov_results', 'initialize', 'k_constant', 'llf', 'load', 'model', 'mse_model', 'mse_resid', 'mse_total', 'nobs', 'normalized_cov_params', 'outlier_test', 'params', 'predict', 'pvalues', 'remove_data', 'resid', 'resid_pearson', 'rsquared', 'rsquared_adj', 'save', 'scale', 'ssr', 'summary', 'summary2', 't_test', 'tvalues', 'uncentered_tss', 'use_t', 'wald_test', 'wald_test_terms', 'wresid']
I want to understand the output of print(dir(reg1))
and print(dir(reg1.fit()))
. Where I can get the document of these components and examples of these pieces?
Upvotes: 2
Views: 1064
Reputation: 2517
Some points to know about Python.
Python have built-in offline documentation in python try command in python interpreter help
>>> help(dir)
>>> help(help)
If you want to see online, you can visit pydocs for generic help. And for package specific help, visit pypi (Python package index)
Now specific to your problem. help for statsmodels. which redirects to Homepage
Finally, here is a page which may interest you: Fitting models using R-style formulas.
Upvotes: 1
Reputation: 1721
Man, that is simple "googling" / reading the doc page. What is maybe confusing is the use of statsmodels.formula.api
. This is to provide the possibility of entering R-style formulas.
The Docs of statsmodels are located here: StatsModels Index Page. Scroll down until you reach "Table of Contents". There click on Linear Regression. Scrolling down to Module Reference
there are links to Model Classes
and Result Classes
.
The correct model class is already pointed out by @Bill Bell: it is OLS. Below methods
, you can find the link to the documentation of fit
, where it states that fit
returns a RegressionResults
object.
The RegressionResults doc page explains the attributes you are interested in.
Note that:
__
, e.g. __class__
etc. are Python special attributes.?
, e.g. by typing reg1?
(much like in R
where you pre-pend the ?
) Upvotes: 1
Reputation: 11
dir() is used to list all the attributes, methods and variables in a module just like in R as library(lme4) methods(class = "merMod") You can also try reg1.dict
Upvotes: 1
Reputation: 21663
>>> reg1.__module__
'statsmodels.regression.linear_model'
Googling for this gave me the page, http://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html, which includes a link to fit
.
I don't know that this has everything you need. I hope it's a leg up.
Upvotes: 0