Araph7
Araph7

Reputation: 71

Return name of empty columns from a list

I have a named list of data frames that all contain the same columns, but for some of these data frames some of these columns are empty. What Im hoping to return is the name of the data frame in the list, and the name(s) of the empty column.

The repex below mirrors the process I am using on the full problem

library(tidyverse)

data("diamonds") 

data1 <- diamonds 

data1$color <- NA

data1$price <- NA

data2 <- diamonds

data2$carat <- NA

data1$Type <- "data1"

data2$Type <- "data2"

data1%>%
  bind_rows(data2) -> dataFull

dataSplit <- split(dataFull, f = dataFull$Type)

for(i in dataSplit){
  
  which(sapply(dataSplit[[i]], function(x) all(is.na(x))))
  
}

My hope is to return something like

data1: price, color

data2: carat

I've tried the very basic for-loop included above, which are admittedly not my strong suit.

Upvotes: 0

Views: 140

Answers (3)

akrun
akrun

Reputation: 887691

Using select

library(dplyr)
library(purrr)
map(dataSplit, ~ .x %>% 
      select(where(~ all(is.na(.x)))) %>%
      names)
$data1
[1] "color" "price"

$data2
[1] "carat"

Or in base R

 lapply(dataSplit, \(x) names(x)[!colSums(!is.na(x))])
$data1
[1] "color" "price"

$data2
[1] "carat"

Upvotes: 0

br00t
br00t

Reputation: 1614

library(tidyverse)

data("diamonds") 

data1 <- diamonds 

data1$color <- NA

data1$price <- NA

data2 <- diamonds

data2$carat <- NA

data1$Type <- "data1"

data2$Type <- "data2"

data1%>%
  bind_rows(data2) -> dataFull

dataSplit <- split(dataFull, f = dataFull$Type)

lapply(dataSplit, function(x) {
  cn <- colnames(x)
  isempty <- apply(x, 2, function(col) is.na(col) |> all())
  cn[ isempty ]
})

$data1
[1] "color" "price"

$data2
[1] "carat"

Upvotes: 2

Allan Cameron
Allan Cameron

Reputation: 174393

Your sapply idea was right, but you need to subset the names of each data frame with the output. Also, since you are loading the tidyverse, you may as well use map instead of a loop for brevity:

map(dataSplit, ~ names(.x)[sapply(.x, \(x) all(is.na(x)))])
#> $data1
#> [1] "color" "price"
#> 
#> $data2
#> [1] "carat"

Upvotes: 4

Related Questions