Ali Kılınç
Ali Kılınç

Reputation: 11

Catboost Error: "TypeError: Singleton array cannot be considered a valid collection

I am trying to implement Catboostregresor into my code first time in my life, so it kills me so far. I have come across with several errors and solve them. But this last one is there whatever I have tried so far.

At last, I deleted almost every feature from my dataset for debugging if it is about input set or not. There are several numerical columns named under num_cols; and also 1 categorical column(which is consisting of strings, not numbers etc.) named under cat_cols, only remaining columns after debugging. But error still persists.

class 'pandas.core.frame.DataFrame'
RangeIndex: 395 entries, 0 to 394
Data columns (total 5 columns):
T_CUST_TRI 395 non-null int32
TRIESTE_CNT 395 non-null int32
LANECNT 395 non-null int32
TRADELANE 395 non-null category
TIME_DUE 395 non-null int32
dtypes: category(1), int32(4)

I am consistently getting this error at the end. Thanks for your help and time:

File "C:\Continuum\anaconda3\lib\site-packages\sklearn\model_selection_search.py", line 650, in fit X, y, groups = indexable(X, y, groups)

*File "C:\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 248, in indexable check_consistent_length(result)

File "C:\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 208, in check_consistent_length lengths = [_num_samples(X) for X in arrays if X is not None]

File "C:\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 208, in listcomp
lengths = [_num_samples(X) for X in arrays if X is not None]

File "C:\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 152, in _num_samples
" a valid collection." % x)

TypeError: Singleton array array(catboost.core.Pool object at 0x0000025CF69CFD68, dtype=object) cannot be considered a valid collection.

if feature_selection == 1:

    models = dict()
    
    paramsrf = {
            'est__max_depth':[5, 9, 18, 32],
            'est__n_estimators': [10, 50, 100, 200],
            'est__min_samples_split': [0.1, 1.0, 2],
            'est__min_samples_leaf': [0.1, 0.5, 1]
            }
    
    paramscat = {
            'est__depth': np.linspace(4,10,4,endpoint=True),
            'est__iterations':[250,100,500,1000],
            'est__learning_rate':[0.001,0.01,0.1,0.3],
            'est__bagging_temperature': [0,5,10,25,50],
            'est__border_count':[5,10,20,50,100]
            }
    
    #models['rf'] = [RandomForestRegressor(), paramsrf]
    models['catb'] = [CatBoostRegressor(cat_features = cat_cols, verbose = 0), paramscat]
    
    for key, value in models.items():
                
        start_time = timeit.default_timer()
        
        scorer = ['neg_mean_squared_error', 'neg_mean_absolute_error', 'r2']
        
        if key == 'catb':
            
            preprocessor = ColumnTransformer(transformers = [('num', MinMaxScaler(feature_range = (0,1)), num_cols)])
            
            all_pipe = Pipeline(steps = [('prep', preprocessor), ('est', value[0])])
        
            search_space = value[1]
                        
            pooled = Pool(data = FeaturesData(
                                                num_feature_data = np.array(df_x[num_cols].values, dtype = np.float32), 
                                                cat_feature_data = np.array(df_x[cat_cols].values, dtype= object), 
                                                num_feature_names = num_cols, 
                                                cat_feature_names = cat_cols),
                         label =  np.array(df_y.values.ravel(), dtype = np.float32))
            
            grid_search = GridSearchCV(all_pipe, search_space, cv=5, verbose=1, refit = 'neg_mean_squared_error', scoring = scorer, return_train_score = True, n_jobs = -1)

            grid_search.fit(pooled)
            

Upvotes: 0

Views: 2197

Answers (1)

AzyCrw4282
AzyCrw4282

Reputation: 7744

This error could happen for a number of reasons. For instance,

  1. Variable definition masking your function declaration
  2. Passing a positional argument as a keyword argument
  3. if a column name in your data is the same as an attribute/method of the object containing the data.

I am inclined to think that your error is likely to do with the second point. Somewhere in your code, you may not need to define a kwarg. I would recommend you work through a trial and error method in which you can add/remove line of code to identify where the error is stemming from.

You can also look for solutions here

Upvotes: 0

Related Questions