Python SVM Classifier - issues with input NaNs and data shape

Question

I am trying to build a binary SVM classifier with ECG data to diagnose sleep apnea. With 16,000 odd inputs I'm performing wavelet transform, manually extracting HRV features and storing them in a feature list, and feeding this list into the classifier.

This worked fine with the raw data before I preprocessed it with the Wavelet transform step - some values in the feature list became nan after the transform which meant I got this error for this line of code:

clf.fit(X_train, y_train)

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

so I executed this step:

x = pd.DataFrame(data=X_train)
x=x[~x.isin([np.nan, np.inf, -np.inf]).any(1)]

which solved the ValueError but removing the 'faulty' inputs meant the shapes of X_train and y_train don't match up:

clf.fit(x, y_train)

#error
Found input variables with inconsistent numbers of samples: [11255, 11627]

I am struggling to figure out how to remove the corresponding values from y_train to match up the samples? Or is there a better approach to this?

Please let me know if you need more info on the code.

Python SVM Classifier - issues with input NaNs and data shape

Answers (1)

Related Questions