BrieucA
BrieucA

Reputation: 1

Struggling to understand what to save/use to correctly use evaluate_model() pycaret function

This is the function that I am using to gather the models that I trained. Then I am trying to use evaluate_model(final_results['12']['df_pred_12']['top_3_models_tuned'][0]) and get an error:

'ValueError: Feature shape mismatch, expected: xxx, got yyy'.

When I am not using the function and just run:

exp_name_24 = setup(data=df_pred_24,
target='Result_24',
categorical_features=categorical_features)
set_config('seed', 42)
print(get_config('seed'))
top_3_model_24 = compare_models(n_select=3)

tuned_model_24 = [tune_model(model) for model in top_3_model_24]

print(evaluate_model(tuned_model_24[0]))

evaluate_model() works just fine in that case. I believe I need to use the setup in some ways but I struggle to understand what and how exactly to proceed. I would feel very grateful if someone has any idea on how to tackle the issue. Thanks in advance!

--> This is the function I used to store every models, etc in a nested dictionary:

def train_tune_models(
dataframes: dict[str, pd.DataFrame],
targets: list[int],
categorical_features: list[str],
n_select: int = 3,
save_models: bool = False,
save_dir: str = None
) -> dict[str, dict[str, any]]:
"""Train and tune multiple machine learning models for different targets and dataframes

Parameters
----------
dataframes : dict[str, pd.DataFrame]
    A dictionary of dataframes, where keys are dataframe names and values are the dataframes
    
targets : list[int]
    A list of target values representing different prediction timeframes
categorical_features : list[str]
    A list of column names in the dataframes that should be treated as categorical features
n_select : int, optional
    The number of top models to select, tune, and potentially save for each scenario, by default 3
save_models : bool, optional
    If True, the tuned models will be saved, by default False
save_dir : _type_, optional
    The directory path where models should be saved if save_models is True, by default None

Returns
-------
dict[str, dict[str, any]]
    A nested dictionary structure containing the results:
    - The outer dictionary uses target values as keys.
    - Each inner dictionary uses dataframe names as keys.
    - The values of the inner dictionary contain:
        - 'target': The target value
        - 'top_{n_select}_models': List of top n_select models before tuning
        - 'top_{n_select}_models_tuned': List of top n_select models after tuning
"""
results = {}
for target in targets:
    target_results = {}
    for df_name, df in dataframes.items():
        # Skip dataframes that do not match the current target
        if not df_name.startswith(f'df_pred_{target}'):
            continue
        
        result = {}
        s = setup(data=df, 
                  target=f'Result_{target}', 
                  categorical_features=categorical_features)
        set_config('seed', 42)
        
        models = compare_models(n_select=n_select)
        result['target'] = target
        result['setup'] = s
        result[f'top_{n_select}_models'] = models
        tuned_models = [tune_model(model) for model in models]
        result[f'top_{n_select}_models_tuned'] = tuned_models

        if save_models:
            if not os.path.exists(save_dir):
                os.makedirs(save_dir)
            for i, model in enumerate(tuned_models):
                model_name = f"{model.__class__.__name__}_{df_name}_{i}"
                model_path = os.path.join(save_dir, f"{model_name}")
                save_model(model, model_path)
        
        target_results[df_name] = result
    results[str(target)] = target_results
return results

See ''details of your problem''

Upvotes: 0

Views: 87

Answers (0)

Related Questions