Shane Fraley
Shane Fraley

Reputation: 11

Missing Formula for Plot of SVM model

I have this code I am trying to run. It gets everything right until I want to create my Plot.

# Install package to use Support Vector Machine Algorithm
install.packages("e1071")
# If this function does not work click on the packages tab and check e1071
library("e1071", lib.loc="/Library/Frameworks/R.framework/Versions/3.2/Resources/library")


# Choose File
diabetes <- read.csv(file.choose(), na.strings = "?")
View(diabetes)

##### Data Preprocessing
# Count number of rows with missing data
sum(!complete.cases(diabetes))


# Summary of data set
summary(diabetes)

str(diabetes)


# Replace "no" and ">30" with 0 and "<30" with 1
diabetes$readmitted<-as.character(diabetes$readmitted)
diabetes$readmitted[diabetes$readmitted== "NO"] <- "0"
diabetes$readmitted[diabetes$readmitted== "<30"] <- "1"
diabetes$readmitted[diabetes$readmitted== ">30"] <- "0"
diabetes$readmitted<-factor(diabetes$readmitted)

str(diabetes$readmitted)
summary(diabetes$readmitted)


# Removal of insignificant variables
diabetes$encounter_id<-NULL
diabetes$patient_nbr<-NULL
diabetes$weight<-NULL # Weight had too many missing values to be a part of our model
diabetes$payer_code<-NULL
diabetes$medical_specialty<-NULL
diabetes$nateglinide<-NULL
diabetes$chlorpropamide<-NULL
diabetes$acetohexamide<-NULL
diabetes$tolbutamide<-NULL
diabetes$acarbose<-NULL
diabetes$miglitol<-NULL
diabetes$troglitazone<-NULL
diabetes$tolazamide<-NULL
diabetes$examide<-NULL
diabetes$citoglipton<-NULL
diabetes$glyburide.metformin<-NULL
diabetes$glipizide.metformin<-NULL
diabetes$glimepiride.pioglitazone<-NULL
diabetes$metformin.rosiglitazone<-NULL
diabetes$metformin.pioglitazone<-NULL

# Change variables to be factors
diabetes$admission_type_id<-factor(diabetes$admission_type_id)
diabetes$discharge_disposition_id<-factor(diabetes$discharge_disposition_id)
diabetes$admission_source_id<-factor(diabetes$admission_source_id)

str(diabetes)

# Summary after data pre-processing
summary(diabetes)


# Set Seed and split data set into training and test data
set.seed(1234)
ind <- sample(2, nrow(diabetes), replace = TRUE, prob = c(0.7, 0.3))
train.data <- diabetes[ind == 1, ]
test.data <- diabetes[ind == 2, ]

# Create Model using readmitted as dependent variable
model1<-readmitted~.
model1<-svm(readmitted~., data=train.data)
summary(model1)

plot(model1, diabetes, type='C-classification', kernel='radial')

### I am also having trouble here making the tables###########

# Create table of model vs training data in confusion matrix 
table(predict(model1), train.data$readmitted)

# Pull Test data to get confusion matrix
testPred <- predict(model1, newdata = test.data)
table (testPred, test.data$readmitted)

# Create second model using select readmitted and select variables
model2<-readmitted~race + gender + age + admission_type_id + discharge_disposition_id + time_in_hospital + num_lab_procedures + num_procedures + num_medications + number_outpatient + number_inpatient + number_emergency + number_diagnoses + change + diabetesMed
model2<-svm(model2, data=train.data)
summary(model2)

### Also having trouble here making the second table#########

# Create table using second model and training data
table(predict(model2), train.data$readmitted)
testPred2 <- predict(model2, newdata = test.data)
table (testPred2, test.data$readmitted)

I have been playing around with plot and the tables and cant seems to get anything to work.

I have been using a data set with 9999 rows to test this out on. But my real data set is 107,000 rows. So it takes a long time to run this and find out I am wrong. Any help would be greatly appreciated. Thank You

Upvotes: 0

Views: 1649

Answers (1)

GD_N
GD_N

Reputation: 253

Well , I need data that you are working on. I did run on these kind of problems with large data sets.

  • For data sets ,I prefer using package(caret) this helps in parallel processing too and handles large grids.
  • For plots , library(hexbin) or tabplot package in R might help you.

well above said , is for fast processing your data so that you can use the whole data set and visualizing large datasets.

I am not sure what error you are getting plot. please tell about the error you are getting.

Upvotes: 1

Related Questions