A squarred variable is outside the index

Question

A variation of this post, without the detailed traceback, had been posted in the SO about two hours ago. This version contains the whole traceback.)

I am running StatsModels to get parameter estimates from ordinary least-squares (OLS). Data-processing and model-specific commands are shown below. When I use import statsmodels.formula.api as smas the operative api, the OLS works as desired (after I drop some 15 rows programmatically), giving intuitive results. But when I switch to import statsmodels.api as sm as the binding api, without changing the code almost at all, things fall apart, and Python interpreter triggers an error saying that 'inc_2 is not in the index'. Mind you, inc_2 was computed after the dataframe was read into StatsModels in both model runs: and yet the run was successful in the first, but not in the second. (BTW, p_c_inc_18 is per-capita income, and inc_2 is the former squarred. inc_2 is the offensive element in the second run.)

import pandas as pd 
import numpy as np 
import statsmodels.api as sm 
%matplotlib inline import 
matplotlib.pyplot as plt 
import seaborn as sns 
sns.set(style="whitegrid") eg=pd.read_csv(r'C:/../../../une_edu_pipc_06.csv') pd.options.display.precision = 3 
plt.rc("figure", figsize=(16,8)) 
plt.rc("font", size=14) 
sm_col = eg["lt_hsd_17"] + eg["hsd_17"] 
eg["ut_hsd_17"] = sm_col 
sm_col2 = eg["sm_col_17"] + eg["col_17"] eg["bnd_hsd_17"] = sm_col2 
eg["d_09"]= eg["Rate_09"]-eg["Rate_06"] 
eg["d_10"]= eg["Rate_10"]-eg["Rate_06"] inc_2=eg["p_c_inc_18"]*eg["p_c_inc_18"] 
X = eg[["p_c_inc_18","ut_hsd_17","d_10","inc_2"]] 
y = eg["Rate_18"] 
X = sm.add_constant(X) 
mod = sm.OLS(y, X) 
res = mod.fit()
print(res.summary())

Here is the traceback in full.

KeyError                                  Traceback (most recent call last)
 in 
     17 eg["d_10"]= eg["Rate_10"]-eg["Rate_06"]
     18 inc_2=eg["p_c_inc_18"]*eg["p_c_inc_18"]
---> 19 X = eg[["p_c_inc_18","ut_hsd_17","d_10","inc_2"]]
     20 y = eg["Rate_18"]
     21 X = sm.add_constant(X)

~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2804             if is_iterator(key):
   2805                 key = list(key)
-> 2806             indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
   2807 
   2808         # take() does not accept boolean indexers

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1550             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1551 
-> 1552         self._validate_read_indexer(
   1553             keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
   1554         )

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1644             if not (self.name == "loc" and not raise_missing):
   1645                 not_found = list(set(key) - set(ax))
-> 1646                 raise KeyError(f"{not_found} not in index")
   1647 
   1648             # we skip the warning on Categorical/Interval

KeyError: "['inc_2'] not in index"

What am I doing wrong?

A squarred variable is outside the index

Answers (1)

Related Questions