bioinformatics_student
bioinformatics_student

Reputation: 488

Error with pandas dataframe (needs to be 1-dimensional)

I am trying to determine the conformal predictions for my model with my data. But it gives me following error which occurs at icp.calibrate :

Exception: Data must be 1-dimensional

Below you can find the most recent traceback error about this. Unfortunately I am not sure on what this actually infers based on the code from above. I am using a pandas dataframe for this.

Code:

from sklearn.tree import DecisionTreeRegressor
from nonconformist.cp import IcpRegressor
from nonconformist.base import RegressorAdapter
from nonconformist.nc import RegressorNc, AbsErrorErrFunc, RegressorNormalizer, NcFactory
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd

# -----------------------------------------------------------------------------
# Setup training, calibration and test data
# -----------------------------------------------------------------------------
df = pd.read_csv ("prepared_data.csv")


# Initial split into train/test data
train = df.loc[df['split']== 'train']
valid = df.loc[df['split']== 'valid']

# Proper Validation Set (Split the Validation set into features and target)
X_valid = valid.drop(['expression'], axis = 1)
y_valid = valid.drop(columns = ['new_host', 'split', 'sequence'])

# Create Training Set (Split the Training set into features and target)
X_train = valid.drop(['expression'], axis = 1)
y_train = valid.drop(columns = ['new_host', 'split', 'sequence'])

# Split Training set into further training set and calibration set
X_train, X_cal, y_train, y_cal = train_test_split(X_train, y_train, test_size =0.2)

# -----------------------------------------------------------------------------
# Train and calibrate underlying model
# -----------------------------------------------------------------------------
underlying_model = RegressorAdapter(DecisionTreeRegressor(min_samples_leaf=5))
print("Underlying model loaded")
model = RegressorAdapter(underlying_model)
nc = RegressorNc(model, AbsErrorErrFunc())

print("Nonconformity Function Applied")
icp = IcpRegressor(nc)  # Create an inductive conformal Regressor
print("ICP Regressor Created")

#Dataset Review
print('{} instances, {} features, {} classes'.format(y_train.size,
                                                   X_train.shape[1],
                                                   np.unique(y_train).size))

icp.fit(X_train, y_train)
icp.calibrate(X_cal, y_cal)

#Example Dataframe

new_host  split     sequence    expression
FALSE     train     AQVPYGVS    0.039267878
FALSE     train     ASVPYGVSI   0.039267878
FALSE     train     STNLYGSGR   0.261456561
FALSE     valid     NLYGSGLVR   0.265188519
FALSE     valid     SLGPSNLYG   0.419680588
FALSE     valid     ATSLGTTNG   0.145710993

I've tried splitting the dataset in various ways but I am continuing to have trouble with this. In this case I want to split the data into train and test sets according to an observation's Data Split value. After which, I will split the train set into train and calibration in a second step. Where myfeatures, X_train and my target, y_train

#Traceback Error

Traceback (most recent call last)
<ipython-input-68-083e5dd0b0b6> in <module>
      4 print(type(y_cal))
      5 print(y_cal.index)
----> 6 icp.calibrate(X_cal, y_cal)
      7 print("ICP Calibrated")

~/.local/lib/python3.8/site-packages/nonconformist/icp.py in calibrate(self, x, y, increment)
    102                 else:
    103                         self.categories = np.array([0])
--> 104                         cal_scores = self.nc_function.score(self.cal_x, self.cal_y)
    105                         self.cal_scores = {0: np.sort(cal_scores)[::-1]}
    106 

~/.local/lib/python3.8/site-packages/nonconformist/nc.py in score(self, x, y)
    370                         norm = np.ones(n_test)
    371 
--> 372                 return self.err_func.apply(prediction, y) / norm
    373 
    374 

~/.local/lib/python3.8/site-packages/nonconformist/nc.py in apply(self, prediction, y)
    156 
    157         def apply(self, prediction, y):
--> 158                 return np.abs(prediction - y)
    159 
    160         def apply_inverse(self, nc, significance):

~/.local/lib/python3.8/site-packages/pandas/core/series.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
    633 
    634         # for binary ops, use our custom dunder methods
--> 635         result = ops.maybe_dispatch_ufunc_to_dunder_op(
    636             self, ufunc, method, *inputs, **kwargs
    637         )

pandas/_libs/ops_dispatch.pyx in pandas._libs.ops_dispatch.maybe_dispatch_ufunc_to_dunder_op()

~/.local/lib/python3.8/site-packages/pandas/core/ops/common.py in new_method(self, other)
     62         other = item_from_zerodim(other)
     63 
---> 64         return method(self, other)
     65 
     66     return new_method

~/.local/lib/python3.8/site-packages/pandas/core/ops/__init__.py in wrapper(left, right)
    503         result = arithmetic_op(lvalues, rvalues, op, str_rep)
    504 
--> 505         return _construct_result(left, result, index=left.index, name=res_name)
    506 
    507     wrapper.__name__ = op_name

~/.local/lib/python3.8/site-packages/pandas/core/ops/__init__.py in _construct_result(left, result, index, name)
    476     # We do not pass dtype to ensure that the Series constructor
    477     #  does inference in the case where `result` has object-dtype.
--> 478     out = left._constructor(result, index=index)
    479     out = out.__finalize__(left)
    480 

~/.local/lib/python3.8/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    303                     data = data.copy()
    304             else:
--> 305                 data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
    306 
    307                 data = SingleBlockManager(data, index, fastpath=True)

~/.local/lib/python3.8/site-packages/pandas/core/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
    480     elif subarr.ndim > 1:
    481         if isinstance(data, np.ndarray):
--> 482             raise Exception("Data must be 1-dimensional")
    483         else:
    484             subarr = com.asarray_tuplesafe(data, dtype=dtype)

Exception: Data must be 1-dimensional

Upvotes: 0

Views: 904

Answers (1)

dm2
dm2

Reputation: 4275

pandas.DataFrame.drop() returns a pandas.DataFrame object which is inherently 2-dimensional. So when you assign y_train = valid.drop() you still have a 2-dimensional array (albeit only containing 1 column). On the other hand, pandas.Series object is 1-dimensional, and you can get a pandas.Series by referencing specific column (i.e. valid['expression'] will return a 1-dimensional pandas.Series).

Change y_train = valid.drop() to y_train = valid['expression'] and it should be ok.

Also, fyi, you're using valid DataFrame for X_train, y_train (I thought you might want to use train DataFrame)

Upvotes: 1

Related Questions