Reputation: 1855
I am trying to make a project for predicting cancer detection using CSV file data and I have taken the cancer CSV data into 2 files as the name of X_data.csv and Y_data.csv. Please concern below code who are interesting to help me for making the solutions of the problems,
import all needed libraries and sublibraries:
import tensorflow as tf
import keras.backend as K
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping
from keras.utils import to_categorical
import keras
import numpy as np
from keras.layers import BatchNormalization
from keras.layers import Dropout
from keras import regularizers
import pandas as pd
import sklearn
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format='retina'
Import input (x) and output (y) data, and asign these to df1 and df2:
df1 = pd.read_csv('X_data.csv')
df2 = pd.read_csv('Y_data.csv')
Scale input data:
df1 = preprocessing.scale(df1) //I faced error here
Scaling error is given below:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-aec70d746687> in <module>
1 # Scale input data
2
----> 3 df1 = preprocessing.scale(df1)
~/anaconda3/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
71 FutureWarning)
72 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 73 return f(**kwargs)
74 return inner_f
75
~/anaconda3/lib/python3.8/site-packages/sklearn/preprocessing/_data.py in scale(X, axis, with_mean, with_std, copy)
139
140 """ # noqa
--> 141 X = check_array(X, accept_sparse='csc', copy=copy, ensure_2d=False,
142 estimator='the scale function', dtype=FLOAT_DTYPES,
143 force_all_finite='allow-nan')
~/anaconda3/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
71 FutureWarning)
72 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 73 return f(**kwargs)
74 return inner_f
75
~/anaconda3/lib/python3.8/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
597 array = array.astype(dtype, casting="unsafe", copy=False)
598 else:
--> 599 array = np.asarray(array, order=order, dtype=dtype)
600 except ComplexWarning:
601 raise ValueError("Complex data not supported\n"
~/anaconda3/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
83
84 """
---> 85 return array(a, dtype, copy=False, order=order)
86
87
ValueError: could not convert string to float: 'discrete'
Upvotes: -1
Views: 56
Reputation: 795
The last line literally says what's the problem.
ValueError: could not convert string to float: 'discrete'
. If you print your data (df1.head()
) you'll see there're some string data like the error suggests which the preprocessing
function can't handle.
So you must perform data cleaning first (convert string to int/float, handle any missing data, etc.). You may lookout for something like LabelEncoder()
function from sklearn or one hot encoder
to take care of your string to int issue.
Upvotes: 1