Reputation: 1
I have a dataframe with past days data and current day data. Example columns [ cases , mobility, temp , rh , cases_1, mobility_1 , temp_1 , rh_1, cases_2, mobility_2, temp_2, rh_2 and so on. . ]. My target column (Y) is ‘cases’ and col_i denotes the parameters i days before the current day. The code for %inc mse is given below. This results in some features like temp have a positive value while temp_1 has negative and temp_2 can also have a positive or negative value similar for other features as well. How do I interpret such results and find a combined cumulative effect of the past days columns?
from tabulate import tabulate
feature_importance_df = pd.DataFrame({'Feature Name': X_pastdays_test.columns})
y_pred_baseline = rf_regressor.predict(X_pastdays_test)
baseline_mse = mean_squared_error(y_test, y_pred_baseline)
percent_inc_mse_list = []
for feature_name in X_pastdays_test.columns:
# Permute the values of the feature in the test data
x_test_perturbed = X_pastdays_test.copy()
x_test_perturbed[feature_name] = np.random.permutation(x_test_perturbed[feature_name])
# Make predictions with the perturbed data
y_pred_perturbed = rf_regressor.predict(x_test_perturbed)
# Calculate the mean squared error with the perturbed data
perturbed_mse = mean_squared_error(y_test, y_pred_perturbed)
# Calculate %Inc MSE as a percentage increase in MSE
percent_inc_mse = ((perturbed_mse - baseline_mse) / baseline_mse) * 100
percent_inc_mse_list.append(percent_inc_mse)
feature_importance_df['%Inc MSE'] = percent_inc_mse_list
feature_importance_df = feature_importance_df.sort_values(by='%Inc MSE', ascending=False)
print(feature_importance_df)
Upvotes: 0
Views: 14