mantanam
mantanam

Reputation: 27

R delete Dataframe columns with specific rows conditions

I have a dataframe with empty and NA values, like that:

> DF_datos
 
   V1  V2  V3  V4  V5 V6 
3 aaa aaa             NA
4  bb  bb  70  80     NA
5  cc  cc  80  80     NA
6 ddd ddd             NA  

I would like to delete all columns that contains an empty or NA value on its second or third row, without a loop over the dataframe. With subset function or something similar... The result I want would be:

> DF_datos
 
   V1  V2  V3  V4 
3 aaa aaa           
4  bb  bb  70  80   
5  cc  cc  80  80    
6 ddd ddd     

     
                                         

Upvotes: 1

Views: 92

Answers (2)

akrun
akrun

Reputation: 887891

We can use tidyverse with select from dplyr. In select, specify the logical expression in where to check if there are any non-NA (!is.na(.)) and any non-blank (nzchar) elements in the column to be selected

library(dplyr)
DF_datos %>%
      select(where(~ any(!is.na(.))&any(nzchar(.))))

-output

#   V1  V2 V3 V4
#3 aaa aaa      
#4  bb  bb 70 80
#5  cc  cc 80 80
#6 ddd ddd      

If we need to only check the 2nd or 3rd row

DF_datos %>%
  slice(2:3) %>% 
  select(where(~ any(!is.na(.))&any(nzchar(.)))) %>% 
  names %>% 
  select(DF_datos, .)

Or with Filter from base R (R 4.1.0) using the same logic

Filter(\(x) any(!is.na(x)) & any(nzchar(x)), DF_datos)

-output

#   V1  V2 V3 V4
#3 aaa aaa      
#4  bb  bb 70 80
#5  cc  cc 80 80
#6 ddd ddd      

Or for rows 2 and 3

Filter(\(x) any(!is.na(x)) & any(nzchar(x)), DF_datos[2:3,]) |>
      names() |>
      {\(x) subset(DF_datos, select = x)}()

Or use sum instead of any by checking if the sum of the compound logical expression is greater than 0

Filter(\(x) sum(!is.na(x) & nzchar(x)) > 0, DF_datos)

In earlier R versions use

Filter(function(x) any(!is.na(x)) & any(nzchar(x)), DF_datos)

NOTE: All of the above options are efficient as this loops over the columns and is memory efficient as this won't apply the expression on the whole dataset

Update

Based on the comments, the OP wanted to delete columns if there are any NA or blank in rows 2 or 3.

DF_datos$V6 <- c(NA, NA, 80, NA)

DF_datos %>%
  slice(2:3) %>% 
  select(where(~ all(!is.na(.)) & all(nzchar(.)))) %>% names %>% 
  select(DF_datos, .)

-output

  V1  V2 V3 V4
3 aaa aaa      
4  bb  bb 70 80
5  cc  cc 80 80
6 ddd ddd      

Or using Filter

Filter(\(x) all(!is.na(x)) & all(nzchar(x)), DF_datos[2:3,]) |>
       names() |>
       {\(x) subset(DF_datos, select = x)}()
#   V1  V2 V3 V4
#3 aaa aaa      
#4  bb  bb 70 80
#5  cc  cc 80 80
#6 ddd ddd      

data

DF_datos <- structure(list(V1 = c("aaa", "bb", "cc", "ddd"), V2 = c("aaa", 
"bb", "cc", "ddd"), V3 = c("", "70", "80", ""), V4 = c("", "80", 
"80", ""), V5 = c("", "", "", ""), V6 = c(NA, NA, NA, NA)), row.names = c("3", 
"4", "5", "6"), class = "data.frame")

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389275

Using colSums -

DF_datos[colSums(is.na(DF_datos) | DF_datos == '') != nrow(DF_datos)]

#   V1  V2 V3 V4
#3 aaa aaa      
#4  bb  bb 70 80
#5  cc  cc 80 80
#6 ddd ddd      

Another way to write this would be -

DF_datos[colSums(!is.na(DF_datos) & DF_datos != '') != 0]

Upvotes: 1

Related Questions