DTYK
DTYK

Reputation: 1200

Piping the removal of empty columns using dplyr

I have a data frame of participant questionnaire responses in wide format, with each column representing a particular question/item.

The data frame looks something like this:

id <- c(1, 2, 3, 4)
Q1 <- c(NA, NA, NA, NA)
Q2 <- c(1, "", 4, 5)
Q3 <- c(NA, 2, 3, 4)
Q4 <- c("", "", 2, 2)
Q5 <- c("", "", "", "")
df <- data.frame(id, Q1, Q2, Q3, Q4, Q5)

I want R to remove columns that has all values in each of its rows that are either (1) NA or (2) blanks. Therefore, I do not want column Q1 (which comprises entirely of NAs) and column Q5 (which comprises entirely of blanks in the form of "").

According to this thread, I am able to use the following to remove columns that comprise entirely of NAs:

df[, !apply(is.na(df), 2, all]

However, that solution does not address blanks (""). As I am doing all of this in a dplyr pipe, could someone also explain how I could incorporate the above code into a dplyr pipe?

At this moment, my dplyr pipe looks like the following:

df <- df %>%
    select(relevant columns that I need)

After which, I'm stuck here and am using the brackets [] to subset the non-NA columns.

Thanks! Much appreciated.

Upvotes: 18

Views: 12584

Answers (3)

Richard Telford
Richard Telford

Reputation: 9923

With dplyr version 1.0, you can use the helper function where() inside select instead of needing to use select_if.

library(tidyverse)
df <- data.frame(id = c(1, 2, 3, 4),
                 Q1 = c(1, "", 4, 5), 
                 Q2 = c(NA, NA, NA, NA),
                 Q3 = c(NA, 2, 3, 4), 
                 Q4 = c("", "", 2, 2), 
                 Q5 = c("", "", "", ""))

df %>% select(where(~ !(all(is.na(.)) | all(. == ""))))
#>   id Q1 Q3 Q4
#> 1  1  1 NA   
#> 2  2     2   
#> 3  3  4  3  2
#> 4  4  5  4  2

Upvotes: 21

Ronak Shah
Ronak Shah

Reputation: 388982

We can use a version of select_if

library(dplyr)
df %>%
   select_if(function(x) !(all(is.na(x)) | all(x=="")))

#  id Q2 Q3 Q4
#1  1  1 NA   
#2  2     2   
#3  3  4  3  2
#4  4  5  4  2

Or without using an anonymous function call

df %>% select_if(~!(all(is.na(.)) | all(. == "")))

You can also modify your apply statement as

df[!apply(df, 2, function(x) all(is.na(x)) | all(x==""))]

Or using colSums

df[colSums(is.na(df) | df == "") != nrow(df)]

and inverse

df[colSums(!(is.na(df) | df == "")) > 0]

Upvotes: 35

Nik Muhammad Naim
Nik Muhammad Naim

Reputation: 588

You can use select_if to do this.

Method:

col_selector <- function(x) {
  return(!(all(is.na(x)) | all(x == "")))
}


df %>% select_if(col_selector)

Output:

  id Q2 Q3 Q4
1  1  1 NA   
2  2     2   
3  3  4  3  2
4  4  5  4  2

Upvotes: 5

Related Questions