Sound
Sound

Reputation: 95

R Shiny: Creating factor variables and defining levels

I trying to create a machine learning application with Shiny.

In this application, the user can select the specifications of the input variables (via. input widgets) which will be used to give an estimate of a response variable. To do this, I have created a dataframe from the selected inputs and save it locally as a datatable.

A problem arises when I load the datatable again, since this will cause all categorical variables to become character variables. I can however change these manually with the factor() function and use the levels= specification.

The problem is that I don't want to manually do this every time a new dataset gets used, since this most likly will change the position of the categorical variables in the dataset. There will also most likly not be the same amout of categorical variables in a new dataset.

The dataframe "DATA" is the main dataset which contains the response variable in column 1.

The dataframe "test" is the dataframe constructed from the selected inputs, which will be used as the testset for prediction and will contain only the 1 specified obsevation. This dataframe will always have the response variable as the last column in the dataframe, due to how the dataframe is constructed. So the factor variable in DATA[ ,5] will always correspond to the previous column in the test dataframe: test[ ,4].

It is the test dataframe which needs the factor levels to be specified since it doesn't automatically know the amount of categories when it only contains 1 obsevation.

test[4] <- factor(test[4], levels = unique(DATA[,5]))

I'm trying to write a code that will apply the factor function on all character variables in the dataset and specify the levels no matter the position of the character variable in the dataset.

Here is the code I have written so far:

library(shiny)
library(tidyverse)
library(shinythemes)
library(data.table)
library(RCurl)
library(randomForest)
library(mlbench)
library(janitor)


# Read data
DATA <- BostonHousing

# Rearrange data so the response variable is located in column 1
DATA <- DATA[,c(names(BostonHousing)[14],names(BostonHousing)[-14])]

# Creating a simple RF model
model <- randomForest(medv ~ ., data = DATA, ntree = 500, mtry = 4, importance = TRUE)


# UI -------------------------------------------------------------------------
ui <- fluidPage(
  
  sidebarPanel(
    
    h3("Parameters Selected"),
    br(),
    tableOutput('show_inputs'),
    hr(),
    actionButton("submitbutton", label = "calculate", class = "btn btn-primary", icon("calculator")),
    hr(),
    tableOutput("tabledata")

    
  ), # End sidebarPanel
  
  mainPanel(
    
    h3("Input widgets"),
    uiOutput("select")
    
  ) # End mainPanel
  
) # End UI bracket



# Server -------------------------------------------------------------------------
server <- function(input, output, session) {
  
# Create input widgets from dataset  
  output$select <- renderUI({
    df <- req(DATA)
    tagList(map(
      names(df[-1]),
      ~ ifelse(is.numeric(df[[.]]),
               yes = tagList(sliderInput(
                 inputId = paste0(.),
                 label = .,
                 value = mean(df[[.]], na.rm = TRUE),
                 min = round(min(df[[.]], na.rm = TRUE),2),
                 max = round(max(df[[.]], na.rm = TRUE),2)
               )),
               no = tagList(selectInput(
                 inputId = paste0(.),
                 label = .,
                 choices = sort(unique(df[[.]])),
                 selected = sort(unique(df[[.]]))[1],
               ))
      )
    ))
  })
  

# creating dataframe of selected values to be displayed
  AllInputs <- reactive({
    id_exclude <- c("savebutton","submitbutton")
    id_include <- setdiff(names(input), id_exclude)
    
    if (length(id_include) > 0) {
      myvalues <- NULL
      for(i in id_include) {
        myvalues <- as.data.frame(rbind(myvalues, cbind(i, input[[i]])))
        
      }
      names(myvalues) <- c("Variable", "Selected Value")
      myvalues %>% 
        slice(match(names(DATA[,-1]), Variable))
    }
  })

  
# render table of selected values to be displayed
  output$show_inputs <- renderTable({
    AllInputs()
  })
  
 
# Creating a dataframe for calculating a prediction
  datasetInput <- reactive({  
    
    df1 <- data.frame(AllInputs(), stringsAsFactors = FALSE)
    input <- transpose(rbind(df1, names(DATA[1])))

    write.table(input,"input.csv", sep=",", quote = FALSE, row.names = FALSE, col.names = FALSE)
    test <- read.csv(paste("input.csv", sep=""), header = TRUE)
    
    
# defining factor levels for factor variables
    test[4] <- factor(test[4], levels = unique(DATA[,5])) # <- This line will cause problems if multiple factors in dataset or if different column location
   

# Making the actual prediction and store it in a data.frame     
    Prediction <- predict(model,test)
    Output <- data.frame("Prediction"=Prediction)
    print(format(Output, nsmall=2, big.mark=","))
  })
  

# display the prediction when the submit button is pressed
  output$tabledata <- renderTable({
    if (input$submitbutton>0) { 
      isolate(datasetInput()) 
    } 
  })
  
  
} # End server bracket



# ShinyApp -------------------------------------------------------------------------
shinyApp(ui, server)

Upvotes: 2

Views: 1138

Answers (1)

YBS
YBS

Reputation: 21349

To generalize the factor variables you can use the following code:

# defining factor levels for factor variables
#test[4] <- factor(test[4], levels = unique(DATA[,5])) # <- This line will cause problems if multiple factors in dataset or if different column location

cnames <- colnames(DATA[sapply(DATA,class)=="factor"])
if (length(cnames)>0){
  lapply(cnames, function(par) {
    test[par] <<- factor(test[par], levels = unique(DATA[,par]))
  })
}

You can apply this to BostonHousing2 data as shown below

# Read data
BH <- BostonHousing2
DATA <- BH

# Rearrange data so the response variable is located in column 1
#DATA <- DATA[,c(names(BH)[14],names(BH)[-14])]
DATA <- DATA[,c(names(BH)[5],names(BH)[-5])]   ## for BostonHousing2

# Creating a simple RF model
model <- randomForest(medv ~ ., data = DATA[,-2], ntree = 500, mtry = 4, importance = TRUE)


# UI -------------------------------------------------------------------------
ui <- fluidPage(
  
  sidebarPanel(
    
    h3("Parameters Selected"),
    br(),
    tableOutput('show_inputs'),
    hr(),
    actionButton("submitbutton", label = "calculate", class = "btn btn-primary", icon("calculator")),
    hr(),
    tableOutput("tabledata")
    
  ), # End sidebarPanel
  
  mainPanel(
    
    h3("Input widgets"),
    uiOutput("select")
    
  ) # End mainPanel
  
) # End UI bracket


# Server -------------------------------------------------------------------------
server <- function(input, output, session) {
  
  # Create input widgets from dataset  
  output$select <- renderUI({
    df <- req(DATA)
    tagList(map(
      names(df[-1]),
      ~ ifelse(is.numeric(df[[.]]),
               yes = tagList(sliderInput(
                 inputId = paste0(.),
                 label = .,
                 value = mean(df[[.]], na.rm = TRUE),
                 min = round(min(df[[.]], na.rm = TRUE),2),
                 max = round(max(df[[.]], na.rm = TRUE),2)
               )),
               no = tagList(selectInput(
                 inputId = paste0(.),
                 label = .,
                 choices = sort(unique(df[[.]])),
                 selected = sort(unique(df[[.]]))[1],
               ))
      )
    ))
  })
  
  
  # creating dataframe of selected values to be displayed
  AllInputs <- reactive({
    id_exclude <- c("savebutton","submitbutton")
    id_include <- setdiff(names(input), id_exclude)
    
    if (length(id_include) > 0) {
      myvalues <- NULL
      for(i in id_include) {
        myvalues <- as.data.frame(rbind(myvalues, cbind(i, input[[i]])))
        
      }
      names(myvalues) <- c("Variable", "Selected Value")
      myvalues %>% 
        slice(match(names(DATA[,-1]), Variable))
    }
  })
  
  
  # render table of selected values to be displayed
  output$show_inputs <- renderTable({
    AllInputs()
  })
  
  
  # Creating a dataframe for calculating a prediction
  datasetInput <- reactive({  
    
    df1 <- data.frame(AllInputs(), stringsAsFactors = FALSE)
    input <- transpose(rbind(df1, names(DATA[1])))
    
    write.table(input,"input.csv", sep=",", quote = FALSE, row.names = FALSE, col.names = FALSE)
    test <- read.csv(paste("input.csv", sep=""), header = TRUE)
    
    
    # defining factor levels for factor variables
    #test[4] <- factor(test[4], levels = unique(DATA[,5])) # <- This line will cause problems if multiple factors in dataset or if different column location

    cnames <- colnames(DATA[sapply(DATA,class)=="factor"])
    if (length(cnames)>0){
      lapply(cnames, function(par) {
        test[par] <<- factor(test[par], levels = unique(DATA[,par]))
      })
    }
    
    # Making the actual prediction and store it in a data.frame     
    Prediction <- predict(model,test)
    Output <- data.frame("Prediction"=Prediction)
    print(format(Output, nsmall=2, big.mark=","))
  })
  
  # display the prediction when the submit button is pressed
  output$tabledata <- renderTable({
    if (input$submitbutton>0) { 
      isolate(datasetInput()) 
    } 
  })
  
} # End server bracket

# ShinyApp -------------------------------------------------------------------------
shinyApp(ui, server)

Upvotes: 1

Related Questions