Volkan Demir
Volkan Demir

Reputation: 57

Drop Multiple Columns in R

I have a data of 80k rows and 874 columns. Some of these columns are empty. I use sum(is.na) in a for loop to determine the index of empty columns. Since the first column is not empty, if sum(is.na) is equal to the number of rows of the first column, it means that column is empty.

for (i in 1:ncol(loans)){
  if (sum(is.na(loans[i])) == nrow(loans[1])){
      print(i)
  }
}

Now that I know the indices of empty columns, I want to drop them from the data. I thought about storing those indices in an array and dropping them in a loop but I don't think it will work since columns with data will replace the empty columns. How can I drop them?

Upvotes: 0

Views: 94

Answers (4)

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

A dplyr solution:

df %>%
  select_if(!colSums(., na.rm = TRUE) == 0)

Upvotes: 1

Jeremy
Jeremy

Reputation: 876

You can try to use fundamental skills like if else and for loops for almost all problems, although a drawback is that it will be slower.

# evaluate each column, if a column meets your condition, remove it, then next
for (i in 1:length(loans)){
  if (sum(is.na(loans[,i])) == nrow(loans)){
    loans[,i] <- NULL
  }
}

Upvotes: 0

Karthik S
Karthik S

Reputation: 11584

Does this work:

df <- data.frame(col1 = rep(NA, 5),
                 col2 = 1:5,
                 col3 = rep(NA,5),
                 col4 = 6:10)
df
  col1 col2 col3 col4
1   NA    1   NA    6
2   NA    2   NA    7
3   NA    3   NA    8
4   NA    4   NA    9
5   NA    5   NA   10
df[,which(colSums(df, na.rm = TRUE) == 0)] <- NULL
df
  col2 col4
1    1    6
2    2    7
3    3    8
4    4    9
5    5   10

Another approach:

df[!apply(df, 2, function(x) all(is.na(x)))]
  col2 col4
1    1    6
2    2    7
3    3    8
4    4    9
5    5   10

Upvotes: 2

Fons MA
Fons MA

Reputation: 1282

You should try to provide a toy dataset for your question.

loans <- data.frame(
  a = c(NA, NA, NA),
  b = c(1,2,3),
  c = c(1,2,3),
  d = c(1,2,3),
  e = c(NA, NA, NA)
)


loans[!sapply(loans, function(col) all(is.na(col)))]

sapply loops over columns of loans and applies the anonymous function checking if all elements are NA. It then coerces the output to a vector, in this case logical.

The tidyverse option:

loans[!purrr::map_lgl(loans, ~all(is.na(.x)))]

Upvotes: 2

Related Questions