MetalicSt33l
MetalicSt33l

Reputation: 45

Handle missing values including NULL in R

I am trying to find the total count of all missing values including NA, "", and NULL per column in a data frame. The summary() function only shows the NA values and even the VIM package does the same.

In the PASWR::titanic3 dataset, there are factor columns with empty string which is not being captured in my missingness analysis.

What is a good approach to include the counts of these missing values? Additionally, is there a way to show all the types/frequency of missing values?

Thanks in advance.

Upvotes: 0

Views: 1439

Answers (2)

paoloeusebi
paoloeusebi

Reputation: 1086

Simply convert missing values other than NA with

df[df %in% c("NULL", "")] <- NA

Upvotes: 1

Henry Cyranka
Henry Cyranka

Reputation: 3060

You should try using a user created function. Here is the one I came up with:

library(tidyverse)

test_function <- function(vector){
    ##The ifelse returns TRUE if the element in the vector is NA, NULL, or ""
    x <- ifelse(is.na(vector)|vector == ""|is.null(vector), TRUE, FALSE)

    ##Returns the sum of boolean vector (FALSE = 0, TRUE = 1)
    return(sum(x))
}

To apply the function to a dataframe you can use any of the apply function, but I recommend sapply, since it returns a vector.

##Create a data frame with mock data

test_df <- tibble(x = c(NA, NA, NA, "","",1,2,3),
   y = c(NA, "","","","","","",1),
   z = c(0,0,0,0,0,0,0,0))

##Assign the result to a new variable
 total_missing_by_column <- sapply(test_df, test_function)

##You can also build a data frame with the variables and the total missing

tibble(variable = colnames(test_df),
   total_missing = sapply(test_df, test_function))

Hope it helps

Upvotes: 1

Related Questions