Reputation: 488
I am trying to determine the conformal predictions for my model with my data. But it gives me following error which occurs at icp.calibrate :
Exception: Data must be 1-dimensional
Below you can find the most recent traceback error about this. Unfortunately I am not sure on what this actually infers based on the code from above. I am using a pandas dataframe for this.
Code:
from sklearn.tree import DecisionTreeRegressor
from nonconformist.cp import IcpRegressor
from nonconformist.base import RegressorAdapter
from nonconformist.nc import RegressorNc, AbsErrorErrFunc, RegressorNormalizer, NcFactory
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
# -----------------------------------------------------------------------------
# Setup training, calibration and test data
# -----------------------------------------------------------------------------
df = pd.read_csv ("prepared_data.csv")
# Initial split into train/test data
train = df.loc[df['split']== 'train']
valid = df.loc[df['split']== 'valid']
# Proper Validation Set (Split the Validation set into features and target)
X_valid = valid.drop(['expression'], axis = 1)
y_valid = valid.drop(columns = ['new_host', 'split', 'sequence'])
# Create Training Set (Split the Training set into features and target)
X_train = valid.drop(['expression'], axis = 1)
y_train = valid.drop(columns = ['new_host', 'split', 'sequence'])
# Split Training set into further training set and calibration set
X_train, X_cal, y_train, y_cal = train_test_split(X_train, y_train, test_size =0.2)
# -----------------------------------------------------------------------------
# Train and calibrate underlying model
# -----------------------------------------------------------------------------
underlying_model = RegressorAdapter(DecisionTreeRegressor(min_samples_leaf=5))
print("Underlying model loaded")
model = RegressorAdapter(underlying_model)
nc = RegressorNc(model, AbsErrorErrFunc())
print("Nonconformity Function Applied")
icp = IcpRegressor(nc) # Create an inductive conformal Regressor
print("ICP Regressor Created")
#Dataset Review
print('{} instances, {} features, {} classes'.format(y_train.size,
X_train.shape[1],
np.unique(y_train).size))
icp.fit(X_train, y_train)
icp.calibrate(X_cal, y_cal)
#Example Dataframe
new_host split sequence expression
FALSE train AQVPYGVS 0.039267878
FALSE train ASVPYGVSI 0.039267878
FALSE train STNLYGSGR 0.261456561
FALSE valid NLYGSGLVR 0.265188519
FALSE valid SLGPSNLYG 0.419680588
FALSE valid ATSLGTTNG 0.145710993
I've tried splitting the dataset in various ways but I am continuing to have trouble with this. In this case I want to split the data into train and test sets according to an observation's Data Split value. After which, I will split the train set into train and calibration in a second step. Where myfeatures, X_train and my target, y_train
#Traceback Error
Traceback (most recent call last)
<ipython-input-68-083e5dd0b0b6> in <module>
4 print(type(y_cal))
5 print(y_cal.index)
----> 6 icp.calibrate(X_cal, y_cal)
7 print("ICP Calibrated")
~/.local/lib/python3.8/site-packages/nonconformist/icp.py in calibrate(self, x, y, increment)
102 else:
103 self.categories = np.array([0])
--> 104 cal_scores = self.nc_function.score(self.cal_x, self.cal_y)
105 self.cal_scores = {0: np.sort(cal_scores)[::-1]}
106
~/.local/lib/python3.8/site-packages/nonconformist/nc.py in score(self, x, y)
370 norm = np.ones(n_test)
371
--> 372 return self.err_func.apply(prediction, y) / norm
373
374
~/.local/lib/python3.8/site-packages/nonconformist/nc.py in apply(self, prediction, y)
156
157 def apply(self, prediction, y):
--> 158 return np.abs(prediction - y)
159
160 def apply_inverse(self, nc, significance):
~/.local/lib/python3.8/site-packages/pandas/core/series.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
633
634 # for binary ops, use our custom dunder methods
--> 635 result = ops.maybe_dispatch_ufunc_to_dunder_op(
636 self, ufunc, method, *inputs, **kwargs
637 )
pandas/_libs/ops_dispatch.pyx in pandas._libs.ops_dispatch.maybe_dispatch_ufunc_to_dunder_op()
~/.local/lib/python3.8/site-packages/pandas/core/ops/common.py in new_method(self, other)
62 other = item_from_zerodim(other)
63
---> 64 return method(self, other)
65
66 return new_method
~/.local/lib/python3.8/site-packages/pandas/core/ops/__init__.py in wrapper(left, right)
503 result = arithmetic_op(lvalues, rvalues, op, str_rep)
504
--> 505 return _construct_result(left, result, index=left.index, name=res_name)
506
507 wrapper.__name__ = op_name
~/.local/lib/python3.8/site-packages/pandas/core/ops/__init__.py in _construct_result(left, result, index, name)
476 # We do not pass dtype to ensure that the Series constructor
477 # does inference in the case where `result` has object-dtype.
--> 478 out = left._constructor(result, index=index)
479 out = out.__finalize__(left)
480
~/.local/lib/python3.8/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
303 data = data.copy()
304 else:
--> 305 data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
306
307 data = SingleBlockManager(data, index, fastpath=True)
~/.local/lib/python3.8/site-packages/pandas/core/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
480 elif subarr.ndim > 1:
481 if isinstance(data, np.ndarray):
--> 482 raise Exception("Data must be 1-dimensional")
483 else:
484 subarr = com.asarray_tuplesafe(data, dtype=dtype)
Exception: Data must be 1-dimensional
Upvotes: 0
Views: 904
Reputation: 4275
pandas.DataFrame.drop() returns a pandas.DataFrame object which is inherently 2-dimensional. So when you assign y_train = valid.drop()
you still have a 2-dimensional array (albeit only containing 1 column). On the other hand, pandas.Series object is 1-dimensional, and you can get a pandas.Series by referencing specific column (i.e. valid['expression']
will return a 1-dimensional pandas.Series).
Change y_train = valid.drop()
to y_train = valid['expression']
and it should be ok.
Also, fyi, you're using valid DataFrame for X_train, y_train (I thought you might want to use train DataFrame)
Upvotes: 1