user14380579
user14380579

Reputation:

Checking all columns in data frame for missing values in R

I have a dataframe, books, and I'm trying to loop through all columns and return something like missing if that column has any missing values.

Below is my code. It returns what elements are missing. I then check if TRUE makes up any of those elements, suggesting that that is a missing element.

This works.

However, being new to R, I know there are better ways of doing this that I'm unaware of.

for (col in colnames(books)) {
  bool <- is.na(books[[col]])
  if (TRUE %in% bool) {
    print("Missing")
  } else {
    print("Fine")

  }
}

Upvotes: 1

Views: 10068

Answers (5)

Elvin Aghammadzada
Elvin Aghammadzada

Reputation: 881

Another way to find them with dplyr library is:

mtcars %>%
  select(everything()) %>%  # replace to your needs
  summarize(across(everything(), ~ sum(is.na(.))))

Upvotes: 1

LucaCoding
LucaCoding

Reputation: 65

the following code helped me a lot.

This function will show how many missing values are in any columns of your df

p <- function(x) {sum(is.na(x))/length(x)*100}
apply(df,2,p)

Here: 1. Find each missing value; 2. Create a vector with missing value; 3. Delete missing values from my df.

which(!complete.cases(df)) 
na_df <- which(!complete.cases(df)) 
df1 <- df[-na_df,]

In the last row, I create a new df "df1" with complete values.

All the best

Upvotes: 0

ThomasIsCoding
ThomasIsCoding

Reputation: 101064

The colSums answer by @akrun is super efficient. Here is another implementation for your purpose

seq(ncol(books)) %in% unique(which(is.na(books),arr.ind = TRUE)[,"col"])

Upvotes: 0

akrun
akrun

Reputation: 886938

Using colSums on a logical matrix can count the number of TRUE (TRUE ->1 and FALSE -> 0). From there, create a logical vector with comparison operator (>)

colSums(is.na(books)) > 0 

Upvotes: 0

Gregor Thomas
Gregor Thomas

Reputation: 145755

The anyNA function is built for this. You can apply it to all columns of a data frame with sapply(books, anyNA). To count NA values, akrun's suggestion of colSums(is.na(books)) is good.

Upvotes: 5

Related Questions