Reputation: 45
I am trying to find the total count of all missing values including NA
, "", and NULL
per column in a data frame. The summary()
function only shows the NA
values and even the VIM package does the same.
In the PASWR::titanic3
dataset, there are factor columns with empty string which is not being captured in my missingness analysis.
What is a good approach to include the counts of these missing values? Additionally, is there a way to show all the types/frequency of missing values?
Thanks in advance.
Upvotes: 0
Views: 1439
Reputation: 1086
Simply convert missing values other than NA with
df[df %in% c("NULL", "")] <- NA
Upvotes: 1
Reputation: 3060
You should try using a user created function. Here is the one I came up with:
library(tidyverse)
test_function <- function(vector){
##The ifelse returns TRUE if the element in the vector is NA, NULL, or ""
x <- ifelse(is.na(vector)|vector == ""|is.null(vector), TRUE, FALSE)
##Returns the sum of boolean vector (FALSE = 0, TRUE = 1)
return(sum(x))
}
To apply the function to a dataframe you can use any of the apply function, but I recommend sapply, since it returns a vector.
##Create a data frame with mock data
test_df <- tibble(x = c(NA, NA, NA, "","",1,2,3),
y = c(NA, "","","","","","",1),
z = c(0,0,0,0,0,0,0,0))
##Assign the result to a new variable
total_missing_by_column <- sapply(test_df, test_function)
##You can also build a data frame with the variables and the total missing
tibble(variable = colnames(test_df),
total_missing = sapply(test_df, test_function))
Hope it helps
Upvotes: 1