Reputation: 1
This is the function that I am using to gather the models that I trained. Then I am trying to use evaluate_model(final_results['12']['df_pred_12']['top_3_models_tuned'][0]) and get an error:
'ValueError: Feature shape mismatch, expected: xxx, got yyy'.
When I am not using the function and just run:
exp_name_24 = setup(data=df_pred_24,
target='Result_24',
categorical_features=categorical_features)
set_config('seed', 42)
print(get_config('seed'))
top_3_model_24 = compare_models(n_select=3)
tuned_model_24 = [tune_model(model) for model in top_3_model_24]
print(evaluate_model(tuned_model_24[0]))
evaluate_model() works just fine in that case. I believe I need to use the setup in some ways but I struggle to understand what and how exactly to proceed. I would feel very grateful if someone has any idea on how to tackle the issue. Thanks in advance!
--> This is the function I used to store every models, etc in a nested dictionary:
def train_tune_models(
dataframes: dict[str, pd.DataFrame],
targets: list[int],
categorical_features: list[str],
n_select: int = 3,
save_models: bool = False,
save_dir: str = None
) -> dict[str, dict[str, any]]:
"""Train and tune multiple machine learning models for different targets and dataframes
Parameters
----------
dataframes : dict[str, pd.DataFrame]
A dictionary of dataframes, where keys are dataframe names and values are the dataframes
targets : list[int]
A list of target values representing different prediction timeframes
categorical_features : list[str]
A list of column names in the dataframes that should be treated as categorical features
n_select : int, optional
The number of top models to select, tune, and potentially save for each scenario, by default 3
save_models : bool, optional
If True, the tuned models will be saved, by default False
save_dir : _type_, optional
The directory path where models should be saved if save_models is True, by default None
Returns
-------
dict[str, dict[str, any]]
A nested dictionary structure containing the results:
- The outer dictionary uses target values as keys.
- Each inner dictionary uses dataframe names as keys.
- The values of the inner dictionary contain:
- 'target': The target value
- 'top_{n_select}_models': List of top n_select models before tuning
- 'top_{n_select}_models_tuned': List of top n_select models after tuning
"""
results = {}
for target in targets:
target_results = {}
for df_name, df in dataframes.items():
# Skip dataframes that do not match the current target
if not df_name.startswith(f'df_pred_{target}'):
continue
result = {}
s = setup(data=df,
target=f'Result_{target}',
categorical_features=categorical_features)
set_config('seed', 42)
models = compare_models(n_select=n_select)
result['target'] = target
result['setup'] = s
result[f'top_{n_select}_models'] = models
tuned_models = [tune_model(model) for model in models]
result[f'top_{n_select}_models_tuned'] = tuned_models
if save_models:
if not os.path.exists(save_dir):
os.makedirs(save_dir)
for i, model in enumerate(tuned_models):
model_name = f"{model.__class__.__name__}_{df_name}_{i}"
model_path = os.path.join(save_dir, f"{model_name}")
save_model(model, model_path)
target_results[df_name] = result
results[str(target)] = target_results
return results
See ''details of your problem''
Upvotes: 0
Views: 87