Romain B
Romain B

Reputation: 139

Warning: Factor contains implicit NA

I am new to R and Shiny and I am trying to create an interactive plot with ggplot2. When the user check the checkbox, he has access to a multiple select field to custom the plot.

The original dataframe contains missing values identified as "N/A" in Publisher and Year column. I removed the lines containing NAs with complete.cases so it shouldn't have any NA left.

I run my app : OK. I get to the default plot : OK. I check the checkbox : Warning: Factor 'Publisher' contains implicit NA, consider using 'forcats::fct_explicit_na'

I'd like to remove this warning, at least understand it. If you have any additional comment please do so : my goal is to get better.

app.R :

df<-read.csv("vgsales.csv")
df$Year[df$Year=="N/A"]<-NA
df$Year<-factor(df$Year)
df$Publisher[df$Publisher=="N/A"]<-NA
df$Publisher<-factor(df$Publisher)
df<-df[complete.cases(df),]

pubSales<-na.omit(df
    %>% group_by(Publisher, Year) 
    %>% summarise(Global_Sales=sum(Global_Sales))
)
pubSales<-pubSales[order(pubSales$Year),]

top5Pub<-head(unique(pubSales[order(-pubSales$Global_Sales),]$Publisher),5)

ui <- navbarPage("Video Games Sales",
    tabPanel("Publishers",
        mainPanel(
            titlePanel(
                title = "Publishers sales"
            ),
            sidebarPanel(
                radioButtons(
                    "pubOptions",
                    "Options",
                    c("Top 5 Publishers"="topFivePub",
                      "Custom Publishers"="customPub"),
                    selected="topFivePub"
                ),
                uiOutput("customPubUI")
            ),
            mainPanel(
                plotOutput("pubPlot")
            ),
            width=12
        )
    )
)

server <- function(input, output, session) {

    output$customPubUI<-renderUI({
        if(input$pubOptions=="customPub"){
            selectInput(
                "selectedPub",
                "Editeurs",
                pubSales$Publisher,
                multiple=TRUE
            )
        }
    })

    output$pubSales<-renderTable(pubSales)
    output$pubPlot<-renderPlot({
        ggplot()+
            if(input$pubOptions=="customPub"){
                geom_line(
                    data=pubSales[pubSales$Publisher %in% input$selectedPub,],
                    aes(x=Year,y=Global_Sales,colour=Publisher,group=Publisher)
                )
            }else{
                geom_line(
                    data=pubSales[pubSales$Publisher %in% top5Pub,],
                    aes(x=Year,y=Global_Sales,colour=Publisher,group=Publisher)
                )
            }
    })

}

shinyApp(ui, server)

Upvotes: 2

Views: 16144

Answers (3)

George
George

Reputation: 1463

If your dataframe contains unused levels of factors, I use

pubSales <- droplevels(pubSales)

This removes the unused levels and the error for me.

Upvotes: 0

Colin H
Colin H

Reputation: 660

The warning pops up because NA is non a level in a factor. It is just missing. The warning reminds you there is a "hidden" level in the factor that will not show up when you perform operations on the factor.

For example, a basic factor:

a.factor <- as.factor(c('a', 'b', 'c', NA))

Only has 3 levels when we print it or summarise in a quick table:

> print(a.factor)
[1] a    b    c    <NA>
Levels: a b c

> table(a.factor)
a.factor
a b c 
1 1 1 

Upvotes: 6

heck1
heck1

Reputation: 726

With:

require(shiny)
require(tidyverse)

# Create some sample data:
year <- rep(2000:2018, each=3)

publ <- rep(strrep(c("Pub 1", "Pub2", "pub3"), 1), 19)

Global_Sales <- rep(sample(1:100,19),3)
# Create a observation with NA:
newline <- c(NA, NA, 33)

df <- data.frame(Year = year, Publisher = publ, Global_Sales = Global_Sales)
df <- rbind(df,newline)
df <- na.omit(df)

pubSales<-df %>%  group_by(Publisher, Year)  %>%
  summarise(Global_Sales=sum(Global_Sales)) 

pubSales$Publisher <- as.character(pubSales$Publisher) 

the error does no longer appear. As long as the data you work with in shiny does not contain factors (which is where the "implicit NA" comes from), the error did not appear with my sample data.

Upvotes: 2

Related Questions