Reputation: 33
A variation of this post, without the detailed traceback, had been posted in the SO about two hours ago. This version contains the whole traceback.)
I am running StatsModels to get parameter estimates from ordinary least-squares (OLS). Data-processing and model-specific commands are shown below. When I use import statsmodels.formula.api as smas the operative api, the OLS works as desired (after I drop some 15 rows programmatically), giving intuitive results. But when I switch to import statsmodels.api as sm as the binding api, without changing the code almost at all, things fall apart, and Python interpreter triggers an error saying that 'inc_2 is not in the index'. Mind you, inc_2 was computed after the dataframe was read into StatsModels in both model runs: and yet the run was successful in the first, but not in the second. (BTW, p_c_inc_18 is per-capita income, and inc_2 is the former squarred. inc_2 is the offensive element in the second run.)
import pandas as pd
import numpy as np
import statsmodels.api as sm
%matplotlib inline import
matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="whitegrid") eg=pd.read_csv(r'C:/../../../une_edu_pipc_06.csv') pd.options.display.precision = 3
plt.rc("figure", figsize=(16,8))
plt.rc("font", size=14)
sm_col = eg["lt_hsd_17"] + eg["hsd_17"]
eg["ut_hsd_17"] = sm_col
sm_col2 = eg["sm_col_17"] + eg["col_17"] eg["bnd_hsd_17"] = sm_col2
eg["d_09"]= eg["Rate_09"]-eg["Rate_06"]
eg["d_10"]= eg["Rate_10"]-eg["Rate_06"] inc_2=eg["p_c_inc_18"]*eg["p_c_inc_18"]
X = eg[["p_c_inc_18","ut_hsd_17","d_10","inc_2"]]
y = eg["Rate_18"]
X = sm.add_constant(X)
mod = sm.OLS(y, X)
res = mod.fit()
print(res.summary())
Here is the traceback in full.
KeyError Traceback (most recent call last)
<ipython-input-21-e2f4d325145e> in <module>
17 eg["d_10"]= eg["Rate_10"]-eg["Rate_06"]
18 inc_2=eg["p_c_inc_18"]*eg["p_c_inc_18"]
---> 19 X = eg[["p_c_inc_18","ut_hsd_17","d_10","inc_2"]]
20 y = eg["Rate_18"]
21 X = sm.add_constant(X)
~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2804 if is_iterator(key):
2805 key = list(key)
-> 2806 indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
2807
2808 # take() does not accept boolean indexers
~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
1550 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
1551
-> 1552 self._validate_read_indexer(
1553 keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
1554 )
~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
1644 if not (self.name == "loc" and not raise_missing):
1645 not_found = list(set(key) - set(ax))
-> 1646 raise KeyError(f"{not_found} not in index")
1647
1648 # we skip the warning on Categorical/Interval
KeyError: "['inc_2'] not in index"
What am I doing wrong?
Upvotes: 3
Views: 95
Reputation: 77885
The syntax you used insists that a list of strings is a legal index into eg
. If you print(eg)
, you'll see that it has no such element. I think what you meant was to make a list of elements, each indexed by a single string.
X = [
eg["p_c_inc_18"],
eg["ut_hsd_17"],
eg["d_10"],
eg["inc_2"]
]
Upvotes: 0