user25582985
user25582985

Reputation: 11

ValueError: list.remove(x): x not in list while doing Model Building - Stepwise selection for feature selection

I am doing Stepwise selection for feature selection using statsmodels.api as sm and while running the codes I am getting this error ValueError: list.remove(x): x not in list

for the below piece of code

def stepwise_selection(x, y,
                       initial_list=['discount', 'sla','product_procurement_sla', 'order_payment_type',
       'online_order_perc', 'TV_ads','Sponsorship_ads', 'Content_marketing_ads', 'Online_marketing_ads',
       'NPS', 'Stock_Index', 'Special_sales', 'Payday', 'heat_deg_days', 'cool_deg_days', 
       'total_rain_mm', 'total_snow_cm','snow_on_grnd_cm', 'MA4_listed_price',
       'MA2_discount_offer'],
                       threshold_in=0.01,threshold_out = 0.05, verbose=True):
    
    included = list(initial_list)
    while True:
        changed=False
        ###forward step
        excluded = list(set(x.columns)-set(included))
        new_pval = pd.Series(index=excluded)
        for new_column in excluded:
            model = sm.OLS(y, sm.add_constant(pd.DataFrame(x[included+[new_column]]))).fit()
            new_pval[new_column] = model.pvalues[new_column]
        best_pval = new_pval.min()
        if best_pval < threshold_in:
            best_feature = new_pval.argmin()
            included.append(best_feature)
            changed=True
            if verbose:
                print('Add  {:30} with p-value {:.6}'.format(best_feature, best_pval))
                
                
        ###backward step
        model = sm.OLS(y, sm.add_constant(pd.DataFrame(x[included ]))).fit()
        ###use all coefs except intercept
        pvalues = model.pvalues.iloc[1:]
        worst_pval = pvalues.max() ###null if pvalues is empty
        if worst_pval > threshold_out:
            changed=True
            worst_feature = pvalues.argmax()
            included.remove(worst_feature)
            if verbose:
                print('Drop {:30} with p-value {:.6}'.format(worst_feature, worst_pval))
        if not changed:
            break
    return included
import statsmodels.api as sm  

final_features = stepwise_selection(x, y)

print("\n","final_selected_features:",final_features)

at line

 included.remove(worst_feature)

I tried using del function, but the expected error is different

Upvotes: 1

Views: 30

Answers (2)

Lajos Arpad
Lajos Arpad

Reputation: 76943

included is a list and worst_feature is not part of that list, hence removing it fails and you get the error you have mentioned in the question. You can check whether the element is in the list, like

if pvalues[worst_feature] in included:
    included.remove(pvalues[worst_feature])

Note that I'm actually getting the value located at the index of the worst feature.

This would only technically solve your issue though, because you basically have a logic issue as the culprit. You want to remove the worst included feature, but you search for the worst feature among all features and if the very worst feature happens not to be included, then you will get the error you received. So, a better way to solve this would be to search for the worst feature among the included features rather than among all features.

Upvotes: 0

BDurand
BDurand

Reputation: 460

When calling list.remove(x) you need to pass the value you want to remove, while here you pass the index returned by np.argmax().

See https://docs.python.org/3/tutorial/datastructures.html

Upvotes: 0

Related Questions