Reputation: 531
I was trying to make a deep learning prediction model for predicting whether a person is a CKD patient or not. Can you please tell me? How can I design a neural network for it? How many neurons should I add in each layer? Or is there any other method in Keras to do so? The dataset link: https://github.com/Samar-080301/Python_Project/blob/master/ckd_full.csv
import tensorflow as tf
from tensorflow import keras
import pandas as pd
from sklearn.model_selection import train_test_split
import os
from matplotlib import pyplot as plt
os.chdir(r'C:\Users\samar\OneDrive\desktop\projects\Chronic_Kidney_Disease')
os.getcwd()
x=pd.read_csv('ckd_full.csv')
y=x[['class']]
y['class']=y['class'].replace(to_replace=(r'ckd',r'notckd'), value=(1,0))
x=x.drop(columns=['class'])
x['rbc']=x['rbc'].replace(to_replace=(r'normal',r'abnormal'), value=(1,0))
x['pcc']=x['pcc'].replace(to_replace=(r'present',r'notpresent'), value=(1,0))
x['ba']=x['ba'].replace(to_replace=(r'present',r'notpresent'), value=(1,0))
x['pc']=x['pc'].replace(to_replace=(r'normal',r'abnormal'), value=(1,0))
x['htn']=x['htn'].replace(to_replace=(r'yes',r'no'), value=(1,0))
x['dm']=x['dm'].replace(to_replace=(r'yes',r'no'), value=(1,0))
x['cad']=x['cad'].replace(to_replace=(r'yes',r'no'), value=(1,0))
x['pe']=x['pe'].replace(to_replace=(r'yes',r'no'), value=(1,0))
x['ane']=x['ane'].replace(to_replace=(r'yes',r'no'), value=(1,0))
x['appet']=x['appet'].replace(to_replace=(r'good',r'poor'), value=(1,0))
x[x=="?"]=np.nan
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.01)
#begin the model
model=keras.models.Sequential()
model.add(keras.layers.Dense(128,input_dim = 24, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(128,activation=tf.nn.relu)) # adding a layer with 128 nodes and relu activaation function
model.add(tf.keras.layers.Dense(128,activation=tf.nn.relu)) # adding a layer with 128 nodes and relu activaation function
model.add(tf.keras.layers.Dense(128,activation=tf.nn.relu)) # adding a layer with 128 nodes and relu activaation function
model.add(tf.keras.layers.Dense(128,activation=tf.nn.relu)) # adding a layer with 128 nodes and relu activaation function
model.add(tf.keras.layers.Dense(128,activation=tf.nn.relu)) # adding a layer with 128 nodes and relu activaation function
model.add(tf.keras.layers.Dense(2,activation=tf.nn.softmax)) # adding a layer with 2 nodes and softmax activaation function
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # specifiying hyperparameters
model.fit(xtrain,ytrain,epochs=5) # load the model
model.save('Nephrologist') # save the model with a unique name
myModel=tf.keras.models.load_model('Nephrologist') # make an object of the model
prediction=myModel.predict((xtest))
C:\Users\samar\anaconda3\lib\site-packages\ipykernel_launcher.py:12: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
if sys.path[0] == '':
Epoch 1/5
396/396 [==============================] - 0s 969us/sample - loss: nan - acc: 0.3561
Epoch 2/5
396/396 [==============================] - 0s 343us/sample - loss: nan - acc: 0.3763
Epoch 3/5
396/396 [==============================] - 0s 323us/sample - loss: nan - acc: 0.3763
Epoch 4/5
396/396 [==============================] - 0s 283us/sample - loss: nan - acc: 0.3763
Epoch 5/5
396/396 [==============================] - 0s 303us/sample - loss: nan - acc: 0.3763
Upvotes: 0
Views: 252
Reputation: 667
Here is the structure that I achieved 100% test accuracy with:
model=keras.models.Sequential()
model.add(keras.layers.Dense(200,input_dim = 24, activation=tf.nn.tanh))
model.add(keras.layers.Dense(1, activation=tf.nn.sigmoid))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) # specifiying hyperparameters
xtrain_tensor = tf.convert_to_tensor(xtrain, dtype=tf.float32)
ytrain_tensor = tf.convert_to_tensor(ytrain, dtype=tf.float32)
model.fit(xtrain_tensor , ytrain_tensor , epochs=500, batch_size=128, validation_split = 0.15, shuffle=True, verbose=2) # load the model
results = model.evaluate(xtest, ytest, batch_size=128)
Output:
3/3 - 0s - loss: 0.2560 - accuracy: 0.9412 - val_loss: 0.2227 - val_accuracy: 0.9815
Epoch 500/500
3/3 - 0s - loss: 0.2225 - accuracy: 0.9673 - val_loss: 0.2224 - val_accuracy: 0.9815
1/1 [==============================] - 0s 0s/step - loss: 0.1871 - accuracy: 1.0000
The last line represents the evaluation of the model on the test dataset. Seems like it generalized well :)
------------------------------------------------- Original answer below --------------------------------------------------- I would go with a logistic regression model first in order to see if there is any predictive value to your dataset.
model=keras.models.Sequential()
model.add(keras.layers.Dense(1,input_dim = 24, activation=tf.nn.sigmoid))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) # specifiying hyperparameters
model.fit(xtrain,ytrain,epochs=100) # Might require more or less epoches. It depends on the amount of noise in your dataset.
If you see you receive an accuracy score that satisfies you, I would give it a try and add 1 or 2 more dense hidden layers with between 10 to 40 nodes. It's important to mention that my advice is solely based on my experience.
I HIGHLY(!!!!) recommend transforming the y_label into a binary value when 1 represents the positive class (a record is a record of a CKD patient) and 0 represents the negative class. Let me know if it works, and if it doesn't I'll also try to play with your dataset.
Upvotes: 1
Reputation: 48
apparently you seem to have problem with your data pre-processing you can use
df.fillna('ffill')
and also you can use feature columns to do those long tasks example:
CATEGORICAL_COLUMNS = ['columns','which have','categorical data','like sex']
NUMERIC_COLUMNS = ['columns which have','numeric data']
feature_column =[]
for items in CATEGORICAL_COLUMNS:
feature_column.append( tf.feature_clolumns.categorical_columns_with_vocavulary_list(items, df[items].unique()))
for items in NUMERIC_COLUMNS:
feature_column.append( tf.feature_clolumns.numeric_columns(items, df[items].unique()))
now you can use these feature columns to make a prediction for your model which will be more accurate more can be done in data preprocessing here is the official documentation to help you more : tensorflow Documentation on feature columns
Upvotes: 1