SHARVARI WANJARI
SHARVARI WANJARI

Reputation: 1

negative and positive % increase in mse for a same feature over previous day

I have a dataframe with past days data and current day data. Example columns [ cases , mobility, temp , rh , cases_1, mobility_1 , temp_1 , rh_1, cases_2, mobility_2, temp_2, rh_2 and so on. . ]. My target column (Y) is ‘cases’ and col_i denotes the parameters i days before the current day. The code for %inc mse is given below. This results in some features like temp have a positive value while temp_1 has negative and temp_2 can also have a positive or negative value similar for other features as well. How do I interpret such results and find a combined cumulative effect of the past days columns?

from tabulate import tabulate
feature_importance_df = pd.DataFrame({'Feature Name': X_pastdays_test.columns})  

y_pred_baseline = rf_regressor.predict(X_pastdays_test)
baseline_mse = mean_squared_error(y_test, y_pred_baseline)



percent_inc_mse_list = []
for feature_name in X_pastdays_test.columns:
    # Permute the values of the feature in the test data
    x_test_perturbed = X_pastdays_test.copy()
    x_test_perturbed[feature_name] = np.random.permutation(x_test_perturbed[feature_name])


    # Make predictions with the perturbed data
    y_pred_perturbed = rf_regressor.predict(x_test_perturbed)


    # Calculate the mean squared error with the perturbed data
    perturbed_mse = mean_squared_error(y_test, y_pred_perturbed)


    # Calculate %Inc MSE as a percentage increase in MSE
    percent_inc_mse = ((perturbed_mse - baseline_mse) / baseline_mse) * 100
    percent_inc_mse_list.append(percent_inc_mse)



feature_importance_df['%Inc MSE'] = percent_inc_mse_list


feature_importance_df = feature_importance_df.sort_values(by='%Inc MSE', ascending=False)


print(feature_importance_df)

Upvotes: 0

Views: 14

Answers (0)

Related Questions