Reputation: 11
I am doing Stepwise selection for feature selection using statsmodels.api as sm and while running the codes I am getting this error
ValueError: list.remove(x): x not in list
for the below piece of code
def stepwise_selection(x, y,
initial_list=['discount', 'sla','product_procurement_sla', 'order_payment_type',
'online_order_perc', 'TV_ads','Sponsorship_ads', 'Content_marketing_ads', 'Online_marketing_ads',
'NPS', 'Stock_Index', 'Special_sales', 'Payday', 'heat_deg_days', 'cool_deg_days',
'total_rain_mm', 'total_snow_cm','snow_on_grnd_cm', 'MA4_listed_price',
'MA2_discount_offer'],
threshold_in=0.01,threshold_out = 0.05, verbose=True):
included = list(initial_list)
while True:
changed=False
###forward step
excluded = list(set(x.columns)-set(included))
new_pval = pd.Series(index=excluded)
for new_column in excluded:
model = sm.OLS(y, sm.add_constant(pd.DataFrame(x[included+[new_column]]))).fit()
new_pval[new_column] = model.pvalues[new_column]
best_pval = new_pval.min()
if best_pval < threshold_in:
best_feature = new_pval.argmin()
included.append(best_feature)
changed=True
if verbose:
print('Add {:30} with p-value {:.6}'.format(best_feature, best_pval))
###backward step
model = sm.OLS(y, sm.add_constant(pd.DataFrame(x[included ]))).fit()
###use all coefs except intercept
pvalues = model.pvalues.iloc[1:]
worst_pval = pvalues.max() ###null if pvalues is empty
if worst_pval > threshold_out:
changed=True
worst_feature = pvalues.argmax()
included.remove(worst_feature)
if verbose:
print('Drop {:30} with p-value {:.6}'.format(worst_feature, worst_pval))
if not changed:
break
return included
import statsmodels.api as sm
final_features = stepwise_selection(x, y)
print("\n","final_selected_features:",final_features)
at line
included.remove(worst_feature)
I tried using del function, but the expected error is different
Upvotes: 1
Views: 30
Reputation: 76943
included
is a list and worst_feature
is not part of that list, hence removing it fails and you get the error you have mentioned in the question. You can check whether the element is in the list, like
if pvalues[worst_feature] in included:
included.remove(pvalues[worst_feature])
Note that I'm actually getting the value located at the index of the worst feature.
This would only technically solve your issue though, because you basically have a logic issue as the culprit. You want to remove the worst included feature, but you search for the worst feature among all features and if the very worst feature happens not to be included, then you will get the error you received. So, a better way to solve this would be to search for the worst feature among the included features rather than among all features.
Upvotes: 0
Reputation: 460
When calling list.remove(x) you need to pass the value you want to remove, while here you pass the index returned by np.argmax().
See https://docs.python.org/3/tutorial/datastructures.html
Upvotes: 0